Wednesday, July 4, 2012

Cantrell with Non-Iterative PROX and Winsteps

The Cantrell data are very different from Nursing1 data. They include 34 students by 14 items instead of 22 students by 21 items (but similar, 476 and 462 data points). The average test score is less than 50% (48% instead of 80%). [I always suggest a minimum of 1000 data points for traditional test analysis, classical test theory (CTT), and the same may apply to item response theory (IRT).]
The student ability-item difficulty tallies produced by PUP PROX and Winsteps appear quite similar until the actual locations are observed. In one case only, the relative locations of student ability and item difficulty are reversed (the third lowest score and difficulty).

Non-iterative PROX determines the final item difficulty locations by subtracting a constant from the initial item difficulty locations and multiplying by an expansion factor constant. Winsteps seeks the final item difficulty location by applying individual mark adjustments. Individual item difficulty mark adjustments (EF) ranged from 1.51 to 2.01 measures (1.78 on average). Student ability mark adjustments (EF) ranged from 2.16 to 4.87 measures (3.09 on average).
Here is the first evidence that Winsteps may treat each student raw score mark and each item difficulty mark individually. Student ability is related to the difficulty of items marked correctly. Item difficulty is related to the ability of the marking students. [Marking one difficult item correctly may be worth as much (latent student ability) as marking two easier items correctly? Being marked correctly by one high ability student may be worth as much (latent item difficulty) as being marked correctly by two lower ability students?]
Both methods of estimation did not set the final location for item difficulties on the 50% (or zero logit) location. Instead, both the graphic method and PROX matched student ability and item difficulty means to the average test score (48%). Winsteps matched the average item difficulties to the average test score, but reported a lower value (40%) for predicted student scores based on student abilities. This may be a case of known "estimation bias" related to "small samples or short tests" that "inflates the logit distance between estimates".  

As student abilities and item difficulties are pushed, either direction, further from the zero starting point and then converted back to normal values, the distribution sags down or rises up from the starting, no change, black box chart line. Winsteps presents a fuller development than non-iterative PROX. (Winsteps uses iterative PROX as the first stage and then JMLE to make the final estimate of measures.)

The small sample size (14 items) and the fact that the Cantrell data were made up for demonstration may contribute to these results. The black box chart can show variations in measure estimates but cannot explain them. More tests need to be examined to determine when, and if, analysis fails.

No comments:

Post a Comment