Education policies that affect millions of students have
long been tied to test scores, but a new paper suggests those scores are
regularly misinterpreted.
According to the new research out of
Mathematica, a statistical research group, the comparisons sometimes used to
judge school performance are more indicative of demographic change than actual
learning.
For example: Last week's release of National
Assessment of Educational Progress scores led to much finger-pointing about
what's working and what isn't in education reform.
But according to Mathematica, policy assessments based on raw test data is
extremely misleading -- especially because year-to-year comparisons measure
different groups of students.
"Every time the NAEP results come out, you see a
whole slew of headlines that make you slap your forehead," said Steven
Glazerman, an author of the paper and a senior fellow at Mathematica. "You
draw all the wrong conclusions over whether some school or district was
effective or ineffective based on comparisons that can't be indicators of those
changes."
"We had a lot of big changes in DC in 2007,"
Glazerman continued. "People are trying to render judgments of Michelle
Rhee based on the NAEP. That's comparing people who are in the eighth grade in
2010 vs. kids who were in the eighth grade a few years ago. The argument is
that this tells you nothing about whether the DC Public Schools were more or
less effective. It tells you about the demographic."
Those faulty comparisons, Glazerman said, were obvious to
him back in 2001, when he originally wrote the paper. But Glazerman shelved it
then because he thought the upcoming implementation of the federal No Child
Left Behind act would make it obsolete.
That expectation turned out to be wrong. NCLB, the
country's sweeping education law which has been up for authorization since
2007, mandated regular standardized testing in reading and math and punished
schools based on those scores. As Glazerman and his coauthor Liz Potamites
wrote, severe and correctable errors in the measurement of student performance
are often used to make critical education policy decisions associated with the
law.
"It made me realize somebody still needs to make
these arguments against successive cohort indicators," Glazerman said,
referring to the measurement of growth derived from changes in score averages
or proficiency rates in the same grade over time. "That's what brought
this about." So he picked up the paper again.
NCLB requires states to report on school status through a
method known as "Adequate Yearly Progress." It is widely acknowledged
that AYP is so ill-defined that it has depicted an overly broad swath of
schools as "failing," making it difficult for states to distinguish
truly underperforming schools. Glazerman's paper argues NCLB's methods for
targeting failing schools are prone to error.
"Don't compare this year's fifth graders with last
year's," Glazerman said. "Don't use the NAEP to measure short-term impacts
of policies or schools."
The errors primarily stem from looking at the percentage
of students proficient in a given subject from one year to the next -- but it
measures different groups of students from year to year, leading to false
impressions of growth or loss.
And using testing data in different -- more accurate --
ways would likely result in states pouring their resources into different
groups of schools. "Differences in scores between two cohorts – say,
fourth graders one year and fourth graders the next year – are comparisons of
two different groups of students," Matthew Di Carlo, senior fellow at the Albert
Shanker Institute, wrote in an email. "They do not even
necessarily reflect real student progress, to say nothing of whether the
changes can be attributed to schooling factors."
The counting flaws highlighted by Glazerman's paper are
particularly significant as states revamp the way they hold schools accountable
for their performance. Though attempts to rewrite No Child Left Behind fizzled
out in Congress this fall, states are rewriting the way they
target schools for interventions through waivers that get them out of NCLB-style
reporting. The federal Education Department has already received waiver
requests from 11 states, and one of the conditions for getting a waiver is
developing a new accountability plan.
"It's gone under the radar with the stalled
reauthorization process," said Doug Harris, a University of Wisconsin
professor who wrote a recent book on education
performance metrics. "You get really different answers depending on what
you do with these numbers. You can talk all you want about what you do with
failing schools but if you haven’t identified schools that are failing, it's a
waste of time."
Glazerman's paper provides equations to help solve these
errors. Meanwhile, researchers hope that school districts wise up when using
test scores to drive policies, such as teacher evaluations.
"Using these data for resource allocation, staffing
and other high-stakes decisions means that accuracy and fairness must be the
primary considerations," Di Carlo wrote. "Most assessments aren’t
designed to measure school and teacher effects in the first place; if they are
to play a productive role in that capacity, it will have to be done in the most
rigorous feasible manner: using longitudinal data, adjusting for non-schooling
factors and interpreting the estimates in a responsible way."
No comments:
Post a Comment