38

This post builds on the previous post on the Rasch rating scale model.
Three relationships were established:

1.
Students with the same scores and items with
the same difficulties were

**grouped**together.
2.
Transposed results for student scores and item
difficulties were equivalent to normal results (this is easily seen in the
dichotomous Nursing1 data) with identical normal student ability and
transposed item difficulty measure means (1.78).

3.
Restoring transposed to normal values and
locations required multiplying by -1 and then adding the measures mean
(flipping the logit scale end for end and then moving the transposed
distribution to the correct location).

The partial credit Rasch model adds to this, the ability to set rating
scale thresholds for each item or each student. Now students with the same
score can receive different estimated abilities; items with the same difficulty
can receive different estimated difficulty calibration values using Fall8850a.data:

1.
A normal partial credit analysis

**groups**student raw scores and treats item difficulties individually (student ability measure mean: 1.24).
2.
A transposed partial credit analysis

**groups**item difficulties and treats student raw scores individually (item difficulty measure means: 1.37).
3.
Restoring transposed partial credit individual student ability
measures only imperfectly aligns them with normal individual item difficulty measures (the means are not identical as with the rating scale method: 1.32).

The last two charts use rating scale results as a fixed reference, as
normal and transposed rating scale means (1.32) are identical from Winsteps. The normal partial credit analysis
held person measures an average of 0.08 log its less than the rating scale method as it developed individual item difficulty
measures. There is a noticeable curve in the relationship between the two methods.

The transposed partial credit analysis held item difficulty measures an average of 0.05 log its more than the rating scale method as it developed individual person
ability measures. The plot is a straight line. The partial credit method can not be directly related to the rating scale method using Fall8850a data.

The individual student and individual item measures can be imperfectly aligned by plotting restored transposed student ability measures with normal item measures. The relative locations of student
ability and item difficulty do not hold constant as the location of a

**group**is only close to the average of the group. Students (within a**group**receiving the same test score) with higher IRT ability measures also had higher percent right (CTT quality) scores using Knowledge and Judgment Scoring in PUP. This makes sense.
The item with the fewer number of omits received a higher location (more
difficult) on the logit scale than another item with the same count or percent
right. This makes sense. An item that more students mark right (and wrong) is
more difficult than an item that more students omitted and ended up with the same
right count. Also items in a

**group**with higher measures were, in general, more discriminating, PUP 7. Test Performance Profile.
Iterative PROX

**groups**items with the same difficulty. It is in the second stage of Winsteps, JMLE, where items are separated individually. (JMLE replaces each mark with a probability, whereas, PROX only uses the marginal cells of student score and item difficulty.)
Separation is higher for

**grouped**values than for individual values. This relates to the wider dispersion of measures for**grouped**values. Reliability is the same as from Knowledge and Judgment Scoring. At this point it is safe to say that any study must use only one IRT method to estimate measures. Different methods yield measures that are similar but not identical.
## No comments:

## Post a Comment