## Wednesday, August 17, 2011

### PUP Quality and Winsteps Measures

31
The clicker data reviewed in Scoring Clicker Data and Grading Clicker Data provide further insight into Winsteps. The chart in Grading Clicker Data shows how each individual student fairs when electing either right mark scoring (RMS) or Knowledge and Judgment Scoring (KJS). That is an applied student view, a rather busy messy one. The grade chart can be simplified by returning to related scores from RMS and KJS.

The above presentation can be further simplified  by removing duplications. It now only relates the scores obtained from the two methods of scoring, RMS and KJS.

KJS scores are composed of a quantity score and a quality score; percent of right marks on the test and percent right of marked items; knowledge and judgment. The exact same values are available from Power Up Plus (PUP) and from Winsteps Table 17.1. [RMS = RT/N; Quality Score = RT/(RT + WG); and KJS = (N + RT - WG)/2N] But how are these scores related to measures?

This chart shows total RMS scores related to measures. This chart prints directly from Winsteps (Plots/Compare Statistics: Scatterplot). This presentation is very similar to the above chart that relates RMS scores to KJS quality scores. Measures are calculated on the number right out of the number marked as are quality scores. Are PUP quality scores and Winsteps measures reporting the same thing?

This scatterplot shows they are the same but not in the same units. Again we have the situation of buying melons by count at \$2 each or by measure at 10 cents a pound. High quality students, who can trust what they know, also exhibit high ability in measures.

The student showing a KJS quality score of 100% (two right out of two marked) is also the student showing the highest full credit ability measure.  The student with the lowest quality score, one right out of 17 marks, also has the lowest full credit ability measure.

Given the above discussion, it then follows that estimated student ability measures from full credit and partial credit scoring show the same relationship as the KJS quality scores do to KJS student test scores in Scoring Clicker Data. The four students with zero test scores are not included in the Winsteps chart as zeros have no usable predictive value. So, the full credit student ability measure is comparable to the KJS quality score. The partial credit student ability measure is comparable to the KJS student test score.

The final chart, in this second end of audit posts, relates the estimated item difficulty measures from full credit and partial credit scoring. They are in very close alignment. Even though students receive very different scores and grades from the two methods, the item difficulty remains the same with the exception of the effect of scale on the results. The full credit estimates are based on total counts of 23. The partial credit estimates are based on total counts of 46. A bit of shift and stretch (mean – mean and SD/SD) can bring these two distributions into agreement.

In conclusion, Winsteps is optimized to calibrate item difficulty for test makers. PUP is optimized to direct student development from passive pupil to self-correcting scholar. Winsteps estimates student ability (measures) to perform on the test (when students are forced to mark every question as is generally done). It estimates student ability (measures) to report what can be trusted as the basis for further learning and instruction when using the partial credit Rasch model (scores identical to KJS).

Both RMS and the full credit Rasch model, that Winsteps is normally used in, suffer from the sampling error created at the lower range of scores where pass/fail cut points are usually set: Even an average “C” student can obtain a “B” one day and a “D” on  another day. Half of the students near the pass/fail line will fall on the other side on the next test with no indication of quality. KJS is a simple solution to this problem as well as a means of directing student development rather than working with questionable student rankings.

The power of self-assessment is lost when students are treated as a commodity rather than as living, learning, self-actualizing beings. A right answer from a person, who has no interest in, places no value on, or sees no connection between facts and observations on the topic has an entirely different meaning than a right answer from a person who is interested in, places a high value on, or sees a web of meaningful relationships between facts and observations. One shows awareness, the other can do and apply. KJS and the partial credit Rasch model can sense this difference in quality. Both incorporate it into the test score. PUP prints it out as a quality score for student counseling and instructional management.