Wednesday, July 18, 2012
Winsteps Cantrell Measure Estimation
The tally charts in Post 33 show non-iterative PROX and Winsteps producing almost identical results with classroom Nursing1 data (21 students and 22 items). Post 34, with Cantrell (made up) data (34 students and 14 items), shows very different results between non-iterative PROX and Winsteps. This post explores the way the different results were produced. Winsteps was stopped after each iteration and the person, Table 13.1, and item, Table 17.1, printouts examined.
Winsteps iterative PROX creates iteration one by subtracting the item measure mean from each item logit (shifting the distribution to the person ability zero measure location). Expansion factors are applied to person ability and Item difficulty, and the item mean is again adjusted to zero measure for the next iteration in Post 35. In contrast, Winsteps JMLE makes adjustments on person ability and item difficulty simultaneously by filling in each cell with the probability of expected score based on both person ability and item difficulty.
The relative location of the third from the lowest student ability measure and the closely related item difficulty measures changes from one iteration to the next on the PROX chart (left). The same is true on the JMLE chart (right) This change in relative location of student ability and item difficulty is, in part, the most noticeable effect of placing two sets of data on the same graph with different starting locations and that are each expanded at different rates. The rate of expansion decreases with each iteration until it is too low to justify further iterations. Convergence is then declared; that point at which person ability and item difficulty are found at the same point on the logit measure scale.
The JMLE chart shows JMLE starting with the last Cantrell PROX iteration. After about two iterations, the locations for person ability and item difficulty resemble those from non-iterative PROX. But JMLE continues on another dozen iterations to iteration 14 before stopping. By now the distribution has been expanded an additional logit in either direction. Clearly the PUP non-iterative PROX and Winsteps JMLE are not in agreement using Cantrell data. The two methods are in almost perfect agreement when using Nursing1 data after just two PROX and two JMLE Winsteps iterations.
My view on this is that poor, and inadequate, data can produce poor results. The Cantrell charts show wavy lines at the greatest distances from the zero logit location. This hunting, hysteresis effect, indicates the data reduction method is making large changes that may lead to inaccurate results. The JMLE portion of Winsteps is a more finely tuned method than the iterative PROX portiion.
Four methods for estimating measures have now been explored: graphic, non-iterative PROX, iterative PROX and JMLE. These inventions each have increasing sensitivity in producing convergence. Since the first three have been fully discussed (and have been found to have no need for any pixy dust), I am willing to trust that JMLE does not require any either. In general, the location for person ability will yield a higher expected score than the raw test scores from which it is derived. The further the raw score is above 50%, the greater the difference between raw score and expected score. The same goes for scores below 50%; the lower the raw test score, the increasingly lower the expected score.
I am still puzzled by two observations: Students correctly answering the same number of questions of differing difficulty land at the same ability location. Items answered correctly by the same number of students with differing abilities land at the same difficulty location. This does not, in my opinion, square with ability-independent and item-independent qualities or that correctly marking one difficult question is worth marking two easier questions.
The Rasch model requires student ability and item difficulty to be located on one fairly linear scale. It adds properties related to latent student ability and latent Item difficulty. I see nothing in the four examined estimation methods that, by themselves, confers these powers or properties to marks on answer sheets. The elusive properties of the Rasch model may be based more on use and operator skill than on the methods for estimating student ability and item difficulty measures.