Wednesday, July 25, 2012

Rasch Rating Scale Model

At this point I have not found a way to directly relate the inner workings (not the outputs) of the partial credit Rasch model and the non-iterative PROX methods for estimating measures. I have now found an intermediary method for estimating measures that bridges this gap: the Rasch rating scale method within Winsteps.

Both the Rasch rating scale method and non-iterative PROX method produce the same measures for students with the same scores and for items with the same difficulties. Winsteps transposed values (columns become rows and rows become columns) also group students and items by common scores and difficulties. You can visualize this by printing out the normal and transposed ability-difficulty tallies (person-item bar charts) and then flipping the transposed bar chart left to right and then tipping it end for end.

These logit values can be restored to normal by multiplying by a -1 to flip the log ratio scale (that is centered on zero) end for end and then adding the measure means (1.78 log its in this Nursing1 example) to shift the values into their original locations.

Restoring again matches student and item measures. A transposed student ability value of -0.34 is restored to +2.12 (0.34 + 1.78). A transposed item difficulty value of 3.30 is restored to -1.51 (-3.30 + 1.78).
Rounding from 1/10 measure to 1/4 measure (Winsteps) produces a better looking bar chart. But there is a noticeable distortion. This becomes apparent when restoring transposed Rasch rating scale values. The original and restored values are the same when manipulating the numbers (flip the log ratio scale and add the measure mean).

The charts, however, can easily be seen to differ after flipping and tipping by placing normal and transposed charts on one sheet. This distortion is an artifact from rounding numbers in two directions on a logit scale.

With this in mind, the Rasch rating scale method adds a neat feature. Instead of just counting right marks (the traditional forced choice, DUMB test), a test can be designed to let students report what they actually know and trust to be of value; what they understand and find useful at all levels of thinking. This is the same as Knowledge and Judgment Scoring. The Fall8850a data ranked responses 0, 1, and 2 for wrong (guessing, poor judgment in reporting what is known and trusted), omit (good judgment in accurately reporting what is known and trusted), and right (reporting what is known and trusted).

However the Rasch rating scale still groups students with like scores, and items with like difficulties, as convergence is controlled by one set of rating scale thresholds for all items (and when transposed, for all students). The partial credit Rasch model builds on this foundation.

No comments:

Post a Comment