A good cut score prediction (the benchmark and operational tests yield similar scores) results from skill and luck. If the common set of items (1/4 to all of the questions) is not stable, the prediction fails.
The Rasch model prediction is based on a sequence of transformations starting with observed benchmark student raw scores. The average score on the nursing school test, for example, was 84%. The cut score that every student was to achieve was 75%. Three students did not achieve the cut score.
Rasch Estimated Measures, Chapter 11, is summarized on the Rasch Model Playing Field chart. The average student right answer score (#1) of 84% is transformed into an estimated student ability measure of +1.7 logits. The average item wrong answer score (#2) of 16% is transformed into -1.7 logits and then into an estimated item difficulty measure that matches the student ability measure at that location. The degree of adjustment needed (1.7 logits) to match the measures is less the closer the average student right score is to 50%. At 50% there is no adjustment (#3).
The standard deviation of student scores and item difficulties needs to be in close agreement within each test (person 1.09 and item 1.30 logits on this test) and between tests. This requirement of the Rasch model, using Ministep, has been encountered several times in this blog: Item Discrimination, Person Item Bubble Chart, Standard Units, Perfect Rasch Model, One Step Equating, and Common Item Equating. A perfect match of person and item measures requires identical standard deviations.
The black lines on the charts represent the observed score ogive and the linear, score-from-measure static features of the Rasch model, Winsteps, Table 20.1, They are not changed by test data expressed in standardized units: Standard Units and Perfect Rasch Model.
Rasch model measures predict success half of the time by students on items with matching abilities. But in practice, cut scores are for ALL students to achieve ALL of the time on a test with mixed abilities and difficulties. A group of students with matching abilities for the item difficulties set for an expected score of 0.72 would be expected to fail the nursing test about half of the time. When the expected score of 0.84 was set with the cut score of 75%, all but the three students passed, on average.
PUP classroom predictions of success assume a minimum preparation, on average, of one letter grade above the cut score. Higher quality (understanding) is more reliable than higher quantity (rote memory), in general.
Rasch model predictions of expected scores are more precise. They are based on the unique property that estimated measures are person free and item free. The desired passing rate can be obtained by selecting the correct range and mix of calibrated questions about the cut score (assuming teachers, instruction, students, learning and attitude remain stable, which, one would hope, would not be the case).
No comments:
Post a Comment