Wednesday, February 2, 2011
Concurrent or One-Step Equating
This is the simplest method of equating two tests. It puts all the data into one file. The Rasch model requires that the two sets have matching characteristics.
Two sets came from splitting the 24 by 24 student/item nursing school test into two 12-student by 24-item tests (A and B). Each was analyzed by Ministep.
Crossplotting values, from Winsteps Table 13.1 Items, verified that the two sets performed similarly, with Item difficulties from Test B on the vertical axis and from Test A on the horizontal (y = 09861x + 0.0484 = 1.03). Also the ratio of standard deviations (S.D.) or slope was B (1.34)/ A (1.28) = 1.05. Any value near one is acceptable.
Excel and Winsteps produced the same slope value. Excel produced a higher S.D. (1.37 instead of 1.34 on Group B) as Excel makes a correction for the small number of items.
The S.D. ratio near one is an indicator that the two tests are performing in a similar manner, not a determination that they are exactly alike. There are other statistics that complete a fuller view than just using S.D.
The values for median (half way between extremes) and mode (most frequent tally) are of little use with samples of only 12 students and 24 questions, illustrated on the Estimated Item Difficulty chart above, except to indicate that the data are skewed (mean, median and mode are not the same). Group B has almost no skew (0.05).
A value of one for kurtosis indicates the sample fits the relative height of the normal curve. Group B is very flat (-1.31). Four of the five Group B plot points are about the same height on the Item Difficulty Distribution chart. Most striking on this chart is that when two small sets of very similar data (A and B) are combined, the result (C) takes on a much different appearance. A 24 by 24 student/item matrix (about 500 data points) is near the minimum requirement for both Winsteps and PUP. Part of the change in appearance is captured in maximum, minimum and range (see top chart). All of these statistics deal with the characteristics of group performance rather than individual student or item performance.
There is a need to keep in mind, what these basic statistics capture in numbers, as an overall perspective to specific analyses. Winsteps captures individual estimated student ability and item difficulty measures. PUP captures what individual students trust they know and their ability to use what they know: quantity and quality (with Knowledge and Judgment Scoring). Once these numbers have been obtained they are easily manipulated. Many different stories can be told from the same data, especially when students are not permitted to exercise their own judgment in reporting what they trust, on paper tests and with computer adaptive testing (CAT).