One way to understand something is to audit it by comparing it with something that is understood. I will proceed from the familiar to the unfamiliar until it becomes sufficiently familiar that it can carry the load alone in exploring the unique features of the Rasch model.
Precision is calculated differently for CTT (CSEM) and IRT
(SEE). In this post I compare the conditional standard error of measurement (CSEM)
based on the amount of right marks and the standard error of estimate (SEE) based
on the rate of making right marks (a ratio instead of a count). It then follows
that a count of 1 (out of 20) is the same as a ratio of 1:19 or a rate of 1/20.Precision in Rasch IRT is then the
inverse of precision in CTT in order to align precision estimates with their respective
distributions (logit and normal).
Table 45, my audit tool, relates CTT and IRT using real
classroom test results. It does not show the values when calculating the
probability of a right mark (Table 45b). I dissected that equation into the
difference between ability and difficulty (Table 48a) in measures and in normal
values (Table 48b).
Table 49 shows the highlights. I found the degree of
difference between item difficulties (15, 18, and 21) of 3 was maintained
across all student scores (from 14 to 20), 0.81 for 3 counts below the average
difficulty of 18, and 1.62 for 3 counts above the average item difficulty
(upper right). A successful converge then maintains a constant difference
across varying student scores. [Is this the basis for item difficulties being
independent from student scores?]
On the average student score (17) the difficulty measures
value doubled from 0.81, at 3 counts below the mean, to 1.62 at the mean, and
doubled again to 3.24 at 3 counts above the mean (center upper left).
The above uniform expansion on the logit scale (Table 49) yields
an even larger expansion on the normal scale. As the converging process
continues, the top scores are pushed further from the test mean score than
lessor scores. The top score (and equivalent logit item difficulty) of 4.95 was
pushed out to 141 normal units. That is about 10 times the value for an item
with a normal difficulty of 15 that is an equal distance below the test mean
Re-plotting Chart 92, normal scale, on measures (Chart 102)
shows how markers below the mean are compressed and markers above the mean are
expanded. The uniformly spaced normal markers in Chart 92 are now spaced in
increasing distance from left to right in measures in Chart 102.
I sequentially summed each item information function and
plotted the item characteristic curves (Chart 103, normal scale and Chart 104,
measures). High scores (above 17 count/81%/1.73 measures) with low precision
drop away from the useful straight line portion of the Rasch model curve for
items with low difficulties (high right score counts). This makes sense.
information curves fit on the higher end of the Rasch model as shown in Chart
91. I plotted an estimated location in Chart 105 and included the right counts
for scores and items after Winsteps, Table 20.1. [Some equations see actual counts, other equations only see smoothed normal curves.]
Chart 91 from classroom data (mean = 80%) is very similar to
Chart 90 from Dummy data (mean = 50%) for a right count of 17. I added precision
Dummy data from Table 46 to Chart 90 to obtain a general summary of precision
based on the rate of making right marks (Chart 106). Chart 106 relates measures (ratio) to estimated scores
(counts) by way of the Rasch model curve where student ability and item
difficulty are 50% right and 50% wrong for each measure location. See Chart 82
for a similar display using a normal scale instead of a logit scale. In both
cases, IRT precision is much more stable than CTT precision.
SEE, Table 46) ranges from 0.44 for 21 items. [50 items = 0.28; 100 = 0.20; 200
= 0.14; and 400 = 0.10 measure (0.03 for 3,000 items.] Only a near infinite
number of items would bring it to zero. [Error variance on 50 items = 0.08; 100
= 0.04; 200 = 0.02; and 400 items = 0.01. Doubling the number of items on a
test cuts the error variance in half. SEE = SQRT(Error Variance).]
The process of
converging two normal scales (student scores and item difficulties) involves
changing two normal distributions (0 to infinity with 50% being the midpoint) into
locations (measures) on a logit scale (- infinity to + infinity with zero (0)
being the midpoint. The IRT analysis then inverts the variance (information) to
match the combined logit scale distribution (see comments I added to the bottom
of Table 46). The apparent paradox appears if you ignore the two different
scales for CTT (normal) and Rasch IRT (logit). Information must be inverted to
match the logit distribution of measure locations.
The Rasch model makes computer adaptive testing (CAT)
possible. IMHO it does not justify a strong emphasis on items with a difficulty
of 50%. Precision is also limited by the number of items on the test. Unless the Rasch IRT partial credit
model is used, where students report what they actually know, the results are
limited to ranking a student by comparing the examinee’s responses to those
from a control group that is never truly comparable to the examinee as a
consequence of luck on test day (different date, preparation, testing
environment and a host of other factors). The results continue to be the “best
that psychometricians can do” in making test results “look right” rather than
an assessment of what students know and can do (understand and value) as the
basis for further learning and instruction.
Addendum: Billions of dollars and classroom hours have been wasted in a misguided attempt to improve institutionalized education in the United States of America using traditional forced-choice testing. Doing more of what does not work, will not make it work. Doing more at lower levels of thinking will not produce higher levels of thinking results; instead, IMHO, it makes failure more certain (forced-choice at the bottom of Chart 101). Individually assessing and rewarding higher levels of thinking does produce positive results. Easy ways to do this have now existed for over 30 years! Two are now free.