49

Precision is calculated differently for CTT (CSEM) and IRT
(SEE). In this post I compare the conditional standard error of measurement (CSEM)
based on the amount of right marks and the standard error of estimate (SEE) based
on the rate of making right marks (a ratio instead of a count). It then follows
that a count of 1 (out of 20) is the same as a ratio of 1:19 or a rate of 1/20. Precision in Rasch IRT is then the
inverse of precision in CTT in order to align precision estimates with their respective
distributions (logit and normal).

Table 48 |

Table 49 |

On the average student score (17) the difficulty measures
value doubled from 0.81, at 3 counts below the mean, to 1.62 at the mean, and
doubled again to 3.24 at 3 counts above the mean (center upper left).

The above uniform expansion on the logit scale (Table 49) yields
an even larger expansion on the normal scale. As the converging process
continues, the top scores are pushed further from the test mean score than
lessor scores. The top score (and equivalent logit item difficulty) of 4.95 was
pushed out to 141 normal units. That is about 10 times the value for an item
with a normal difficulty of 15 that is an equal distance below the test mean
(lower left).

Chart 102 |

Chart 104 |

Chart 91 |

Chart 105 |

Chart 90 |

Chart 106 |

Chart 82 |

Precision (IRT
SEE, Table 46) ranges from 0.44 for 21 items. [50 items = 0.28; 100 = 0.20; 200
= 0.14; and 400 = 0.10 measure (0.03 for 3,000 items.] Only a near infinite
number of items would bring it to zero. [Error variance on 50 items = 0.08; 100
= 0.04; 200 = 0.02; and 400 items = 0.01. Doubling the number of items on a
test cuts the error variance in half. SEE = SQRT(Error Variance).]

The process of
converging two normal scales (student scores and item difficulties) involves
changing two normal distributions (0 to infinity with 50% being the midpoint) into
locations (measures) on a logit scale (- infinity to + infinity with zero (0)
being the midpoint. The IRT analysis then inverts the variance (information) to
match the combined logit scale distribution (see comments I added to the bottom
of Table 46). The apparent paradox appears if you ignore the two different
scales for CTT (normal) and Rasch IRT (logit). Information must be inverted to
match the logit distribution of measure locations.

The Rasch model makes computer adaptive testing (CAT)
possible. IMHO it does not justify a strong emphasis on items with a difficulty
of 50%. Precision is also limited by the number of items on the test. Unless the Rasch IRT partial credit
model is used, where students report what they actually know, the results are
limited to ranking a student by comparing the examinee’s responses to those
from a control group that is never truly comparable to the examinee as a
consequence of luck on test day (different date, preparation, testing
environment and a host of other factors). The results continue to be the “best
that psychometricians can do” in making test results “look right” rather than
an assessment of what students know and can do (understand and value) as the
basis for further learning and instruction.

Chart 101 |

References:

Culligan, Brent. Date & Location Unknown. Item Response Theory, Reliability and
Standard Error. 10 pages. http://www.wordengine.jp/research/pdf/IRT_reliability_and_standard_error.pdf

What is the reasoning behind the
formulae for the different standard errors of measurement? 3 pages. Downloaded 4/27/2015. http://stats.stackexchange.com/questions/60190/what-is-the-reasoning-behind-the-formulae-for-the-different-standard-errors-of-m

Basic Statistics Review. Standard Error of Estimate. 4
pages. Downloaded 4/27/2015.