Wednesday, August 1, 2012
Partial Credit Rasch Model
This post builds on the previous post on the Rasch rating scale model. Three relationships were established:
1. Students with the same scores and items with the same difficulties were grouped together.
2. Transposed results for student scores and item difficulties were equivalent to normal results (this is easily seen in the dichotomous Nursing1 data) with identical normal student ability and transposed item difficulty measure means (1.78).
3. Restoring transposed to normal values and locations required multiplying by -1 and then adding the measures mean (flipping the logit scale end for end and then moving the transposed distribution to the correct location).
The partial credit Rasch model adds to this, the ability to set rating scale thresholds for each item or each student. Now students with the same score can receive different estimated abilities; items with the same difficulty can receive different estimated difficulty calibration values using Fall8850a.data:
1. A normal partial credit analysis groups student raw scores and treats item difficulties individually (student ability measure mean: 1.24).
2. A transposed partial credit analysis groups item difficulties and treats student raw scores individually (item difficulty measure means: 1.37).
3. Restoring transposed partial credit individual student ability measures only imperfectly aligns them with normal individual item difficulty measures (the means are not identical as with the rating scale method: 1.32).
The last two charts use rating scale results as a fixed reference, as normal and transposed rating scale means (1.32) are identical from Winsteps. The normal partial credit analysis held person measures an average of 0.08 log its less than the rating scale method as it developed individual item difficulty measures. There is a noticeable curve in the relationship between the two methods.
The transposed partial credit analysis held item difficulty measures an average of 0.05 log its more than the rating scale method as it developed individual person ability measures. The plot is a straight line. The partial credit method can not be directly related to the rating scale method using Fall8850a data.
The individual student and individual item measures can be imperfectly aligned by plotting restored transposed student ability measures with normal item measures. The relative locations of student ability and item difficulty do not hold constant as the location of a group is only close to the average of the group. Students (within a group receiving the same test score) with higher IRT ability measures also had higher percent right (CTT quality) scores using Knowledge and Judgment Scoring in PUP. This makes sense.
The item with the fewer number of omits received a higher location (more difficult) on the logit scale than another item with the same count or percent right. This makes sense. An item that more students mark right (and wrong) is more difficult than an item that more students omitted and ended up with the same right count. Also items in a group with higher measures were, in general, more discriminating, PUP 7. Test Performance Profile.
Iterative PROX groups items with the same difficulty. It is in the second stage of Winsteps, JMLE, where items are separated individually. (JMLE replaces each mark with a probability, whereas, PROX only uses the marginal cells of student score and item difficulty.)
Separation is higher for grouped values than for individual values. This relates to the wider dispersion of measures for grouped values. Reliability is the same as from Knowledge and Judgment Scoring. At this point it is safe to say that any study must use only one IRT method to estimate measures. Different methods yield measures that are similar but not identical.