A bad passing cut score prediction creates havoc for psychometricians, politicians, education officials, teachers, students, parents and taxpayers. It is an indication that for all the care used in selecting calibrated questions for the benchmark test, something went wrong. Psychometricians expect “something to go wrong” at some low rate of occurrence. Predictions based on probabilities are expected to fail occasionally. It is the same matter of luck that students count on to pass when cut scores are set very low.
Psychometricians must deal with two cases: Case 1, Test B, this year’s test, came in above the benchmark test, Test A. Too many students are passing. Or was it the other way? Case 2, Test A, this year’s test, came in below the benchmark test, Test B. Too many students are failing. In either case political credibility is lost and education officials discount the importance and meaningfulness of the assessment.
Case 1 occurred in the previous post on Common Item Equating where Test B values were equated into the Test A frame of reference, a change of -.036 measures. Let’s reverse the assumption for Case 2 and equate Test A values into the Test B frame of reference, a change of +0.36 measures. Both operations produce an over-correction with respect to the right answer (0.48) this audit obtained from Test AB. Over-correction is understandable since the correction is made either from a too high value to a too low value or visa versa. The best result (0.48) is again uniting all data into one benchmark analysis.
It is important to determine what is reality: the expected prediction or the observed results. A recent solution to this problem is to not make a prediction and/or to recalibrate the current year’s results. The objective is to safely make the results look right, a long-standing tradition in institutionalized education.
A cut score prediction is required in research work (the test for a significant difference between chance and the observed results demands a prediction). A cut score prediction is not needed in application. Research work deals with averages. Application deals with individual students, however, the way NCLB tests are administered as forced-choice tests, the results only have meaning as the average ranking of how groups (class, school, district, and state) performed on the test.
Given the quality of forced-choice data (a rank), simple methods for setting cut scores are appropriate. Traditional failing cut scores have been any score between one and two standard deviations below the mean, or scores ranking in the lower 10% to 20%. This allows students to compete for passing. A student’s rank determines the grade and passing. This is a bad way to set cut scores if passing implies being prepared for future course work or the workplace.
A good cut score should separate those who will succeed from those who will fail without additional learning. Anything less is a form of social promotion. Passing the test becomes a rite of passage. The nursing school test cut score of 75% was based on long term experience that students ranking below the cut score tended to fail the NCLEX licensure test on their first attempt.
No comments:
Post a Comment