Chester, I read the opinion piece you co-authored with William Bennett entitled "The
Real Improvement in Texas Schools" which appeared in the NY
Times on October 27, 2000. As my charitable foundation has granted over $5M to education in the past few
years, I have a serious interest in the subject of educational effectiveness. Since you clearly have a much greater understanding of the recent RAND study
than most people, I have a few questions about your opinion piece which I hope
you will answer for me and for others who have read the RAND paper. In your article, you write: "He [Bush] can rightly claim very impressive
accomplishments in the area of education; the closer people look at the Texas
record, the better it is. Nothing in the latest report contradicts this." I do not understand why, if there are such impressive
educational gains in Texas as you suggest above, they failed to show up at all
in Table 1 and Table 2 in Klein's
paper. How do you explain that? Does this mean that NAEP data is useless as
you seem to imply in your editorial? But if that is the case, then don't we need
to totally discount the earlier RAND report since it was based exclusively on
NAEP data? Is that what you are saying here? Or was there a calculation error? A
methodology error? An error in the NAEP data? Table 1 and Table 2 of Klein's
paper used NAEP scale scores right off the NCES web
site and NAEP scores, as you know, are the widely accepted "gold
standard" measure of academic improvement. Even the Bush campaign
admits this as they have clearly given credibility to the earlier RAND study
which used only NAEP scores as well. So are you disagreeing with the Bush
campaign here? You failed to mention in your article that the earlier 250-page RAND study (Grissmer)
focused on a date range that had very little overlap with the time Bush was in
office (it ran from 1990 to 1996). Was that an oversight in writing your
article? Or did you not realize this when you read the study? In your article, you write: First, the 14-page study is
relatively unsophisticated and is based on incomplete data. That's a little too vague for me. Can you tell me specifically what data is
missing? Klein used the same NAEP data as the earlier RAND report (both Texas
and national data) and supplemented it with more recent information that was not
available in the earlier RAND study and supplemented it also with Texas TAAS
data (which the earlier RAND study didn't use at all). So if any data is
missing, shouldn't we be challenging the older RAND report as being
incomplete since it was missing the NAEP 1998 test data and all of the TAAS data? Also, if the "missing data" in the newer Klein RAND report proves
your point, why did you fail to mention this data and what
it shows? Without supplying this "missing" data, your article
draws a conclusion based on missing data. I'm sure you would not want to suffer
from the same problem as you claim the Klein paper suffers from. So what is the
data that is missing? I heard that in a TV interview on the Today Show, a Bush aide claimed that the data on the Texas Education Agency’s web site was not accurate. This seems strange since there is no disclaimer on the Texas web site about this and Texas had two or more years to fix any problems. However, if there is a problem with the TAAS data on the web site, then as far as I can tell, there are only 3 possibilities here, and all three work against your point:
Did you ever call Klein to ask
him about his paper or to tell him about his missing or incorrect data so that
he could issue a retraction? Is there a reason you didn't post the corrected
data on your web site? Is there a reason you didn't publish a paper for
EPAA or some other peer reviewed educational journal that contains the missing
data or even gives someone a hint about what this data is, where you found it,
and how you know it disproves Klein's findings? There are too many respected
researchers who are fooled by Klein's paper, but without people like you
stepping forward to share the truth, they remain in the dark. Why are you
keeping this information secret? In my research, the only comparative study I could find of how things went
over the first four years since Bush took office was in Table 2 of Klein's
paper. Is there another NAEP data peer review study covering the period when
Bush took office that I am not aware of that shows a result that contradicts
Klein's? If not, then shouldn't Table 2 (which showed no gain) be the most
interesting and relevant measure of academic progress? In your editorial, you said that Klein's study was " flawed and
misleading." Yet Klein's paper used more recent NAEP data than the earlier
RAND report (the Grissmer
study). And it did not use scores that were manipulated by statistical
modeling assumptions and therefore prone to error (as in the Grissmer study)
because each racial group was separately treated in Klein's paper. So the
potential for biases stemming from
modeling and assumptions were eliminated in Klein's paper, but not in Grissmer's
study. So what specific flaws are present in Klein's paper? For example, Table 1
and Table 2 use the NAEP data directly. No fancy adjustments like in Grissmer,
no nothing. Not even a hint of "fuzzy math." It was a simple
subtraction. What is flawed and misleading about the data in Table 1 and Table
2? I checked this myself with the NAEP site and there is no error or omission.
You can replicate these calculations yourself in
minutes. Was there some mistake we both made? Why did you not point out what
the error was? I also do not understand how you can possibly give any credibility at all to
the TAAS scores when according to Fig. 3, there was a .75 effect size for Black
8th graders (if we believe the TAAS scores), while there was only a .12 increase
in 8th grade NAEP. As you know, a .75 increase is completely
absurd. An improvement like that in the 2 years after Bush took office, is more
than miraculous. It's unbelievable! As you are a sophisticated scholar of
educational improvement, you are well aware that such a large effect size
substantially closes the gap between whites and blacks. In short, if the
TAAS scores are true, Bush has accomplished the greatest achievement ever made
in education in all of history (by a long shot). That is the kind of effect that
should be proclaimed far and wide in every reputable education journal. Yet, we
find none of this. Why? Is there a conspiracy to keep such an enormous effect a
secret? Why isn't this being copied by other states? By other countries? Why
aren't there tons of academic papers examining this incredible miracle? I
couldn't fine any! Don't you find that a bit odd? I sure did! Could it be that the huge TAAS improvement is due to one or more of these factors:
If none of these are possible reasons for the TAAS scores going up, then how do you know that? The issue is how much carry over is there from TAAS to NAEP in the skills and knowledge that are producing the high TAAS scores. For instance, are the kids really learning how to read better (which should show up on almost any test) or are they just learning how to answer certain highly specific item types (which would only show up on TAAS)? Don't the results show the latter at best At worst, many or all the 7 factors above could be operating and true learning (the type of learning that would show up in any exam), even of a subset of material, may not be happening? Let's say you are right. So what did Texas fundamentally do to account for
such a change since certainly, we should be implementing this elsewhere? I'd
think the Fordham Foundation (which
you head) would want to write up exactly what Texas did here to achieve such a
remarkable result (and it is even more impressive when you consider that Texas
received a "D" in teacher quality from Education Week recently) and encourage others to follow this new
method of instruction. I fail to understand why you have not done this. It is
clearly core to your mission. But it's not on your site anywhere. Why is that?
In fact, you don't even mention the incredible Texas miracle at all in your
December 2000 recommendations to Congress. Why is that? Also, with such an incredible effect size on TAAS as we see in Figure 3, don't you find it extraordinarily odd that these did not show up in NAEP, SAT, ACT, or even in Texas's own TASP scores? Those scores were either flat or declining under Bush (except 1996 4th grade math). Why is that? What is the reason to believe the TAAS scores and disbelieve these other independent measures? Is there a rule we can use for when we should ignore NAEP, SAT, ACT, and TASP scores? Are these national, gold-standard tests scores ever useful? For what? And if the NAEP scores are useless as a measure of comparative performance between students in different states, then we have to discount all those claims of Texas leading the nation, don't we? I'm sure you know that it's been well established (and very consistent with
common sense) that high-stakes testing results are less accurate than low-stakes
testing results. For example, if I want to find out how many people in my
company can perform a sophisticated calculation by hand, say compute the square root of 2, I'll bet
very few can do it. A low-stakes test would reveal a low percentage of people
could do this calculation. But if I tell people the night before the test that
the next day I'll test them on square roots, and that their continued employment
at Propel depends upon their test score, I'm sure you'll agree that one heck of
a lot more people will pass the test. Does that mean I suddenly created a bunch
of mathematical wizards at my company? Or that my employees tested well on test
material they were heavily incentivized to prepare for? I think we clearly know
it's the latter. So the bottom line here is that if we want to really ascertain
the true mathematical prowess of my employees, I should make it a low-stakes
test. So that being the case, if we are to believe the high-stakes test results
and completely discount the low-stakes test results, you must really
conclusively show that the low-stakes tests are inaccurate and the high-stakes
tests are the true measure of broad academic proficiency. If you can do that,
then you will have shown that NAEP is worthless. That would be a major upset in
the industry (as well as disproving the Bush claims of Texas's rank relative to
the rest of the country which is based exclusively on NAEP data). Yet this hasn't been done. Why? You could argue that students do better on TAAS because they are more motivated to get a high score since it is a higher stakes test. But the differences in student motivation between TAAS and NAEP are essentially constant over time. Thus, these differences cannot explain the disparity in GAIN scores between TAAS and NAEP, i.e., the rate of improvement should be correlated. Something else is operating (such as the 7 points above). If Texas is doing such a great job in education, than how could it be that The
New York Times reported
that in February 1999, officials with the University of Texas system presented a
report to a Texas House subcommittee complaining of "marked declines in the number of students who are prepared
academically for higher education." Shouldn't we see the exact
opposite occurring? Also, have you read the two year
study by Walt Haney on TAAS scores? It certainly seems to me that all
trusted third party data shows nothing special is happening in Texas. Don't you
find such a disparity odd that the only data that shows things are peachy in
Texas are the TAAS scores and only one NAEP score (4th grade math in 1996)? Are
there any other independent tests that confirm the TAAS scores? Haney's paper
was also peer reviewed. Have you seen any peer reviewed study that disputes
Haney's findings? Certainly, the only stuff I've seen is stuff like Klein's
paper and a huge number of news reports from Texas newspapers and other sources
that TAAS scores are completely untrustworthy [Dallas Morning News, 8/17/99,
9/22/99, 4/30/99; Houston Chronicle, editorial, 8/8/95; U.S. News & World
Report, 7/19/99]. So if there is independent third party research proving the
TAAS scores are trustworthy and the NAEP scores (and all other scores of other
exams) are not, would you mind giving me a reference to that paper? It certainly appears you and Grissmer were fooled by a jump in 1996 4th grade
math scores due to a content overlap with the NAEP exam. Had you ever considered
that? If you have, then what data do you have to reject the hypothesis? I talked
to Grissmer about it and he said it was quite reasonable... in fact it was the
only reasonable explanation he was aware of that fits the facts. Do you have an
explanation that fits the facts that we don't know about? Why have you not
revealed it? Have you spoken with Grissmer about it? RAND has stated that both reports are correct. I believe that is the case as
well. The score jump in the 4th grade math scores was a one time artifact of the
content overlap. Therefore, all the data is consistent and shows no educational
gain since 1994. I have not seen an explanation consistent with both RAND
reports that comes to an opposite conclusion. Do you have such an explanation
that accounts for Klein's findings and explains them? If you have, why did you
not include it in your editorial? In Figure 4 of Klein's paper, it clearly showed that the achievement gap
widened, while TAAS showed it is narrowing. Is there an error in Figure 4?
If so, can you explain what it is? Calculation error? Methodology error? Or do
you agree with Klein's conclusion (which is contrary to claims made by the Bush
campaign)? Lastly, how do you explain the correlations in Klein's paper since TAAS scores behave "differently" than all other independent tests. For example,
In your conclusion you say that Bush can take credit for these improvements.
Even if you honestly believe that 1996 4th grade math scores truly went up, how
can you possibly know that it happened after 1994, rather than, say, in 1993?
After all, NAEP exams only happen ever 4 years. How do you know that this jump
was due to Bush or to his predecessor? How can you prove from the earlier RAND
report than things improved AFTER 1994? Even Grissmer admits that the data after
1994 is flat in Texas. And there is no way to know when scores improved during
the 1992 to 1996 period—was it evenly distributed, near the beginning, middle,
or end of the four year period? So how do you know it was after 1994? What data
do you have that you are not telling us? Isn't it possible that all the
improvement happened before Bush took office? How can you prove otherwise? Also, as I'm sure you are aware, Blacks and Latinos in Texas have been
scoring highly relatively to the rest of the country for at least a decade. Nobody knows why, but someone has to be first! So
it's a bit unfair to ascribe such ranking to Bush, don't you think? Shouldn't we
be looking at the data before Bush took office and comparing it with data after
Bush took office so we can determine cause and effect? From Table 1 and Table 2,
it sure doesn't look like much improvement to me. How about to you? What are you
seeing in these numbers that I am not? We can look at this same issue another way. Texas likes to point to its high
NAEP scores (e.g., blacks in TX score higher than blacks elsewhere) as evidence
of the effectiveness of its high stakes testing program. This program is so
effective that it even worked backwards in time. How else can you explain the
fact that the NAEP scores for blacks and Hispanics in TX "led the
nation" before the TAAS program
was even implemented in 1994 (see Tables 1 & 2 in Klein's report)? Lastly, I should point out that the reason the RAND report was delayed was due to the additional rounds of peer review (well above the normal peer review process at RAND) that this report went through. I am sure you noticed in the preface to the Klein paper that the external reviewers were Robert Linn and Richard Jaeger, who I understand are two of the top people in the field. Dan Koretz and David Grissmer (the author of the RAND paper that you have said you trust) also reviewed the paper. I'm sure you are also aware that Gene Glass published Klein's paper in another respected peer-reviewed journal (EPAA). So with all these rounds of peer review by eminent scholars, I find it remarkable that you have uncovered errors that these and other experts missed. Rather than keeping these errors confidential, you would do a great public service by educating us all on how so many credible researchers could have been fooled. Why have you not done this? Isn't educating us all on educational issues exactly what the Fordham Foundation is about? In summary, it seems to me that there are only two possibilities here for NAEP scores:
And for TAAS scores,
I look forwarded to hearing your response. Steve Kirsch Here is the E-mail reply from Bennett's office (this is really interesting) |