My discussion with Matt Barnum Part 2

A few weeks ago, fellow TFA alum Matt Barnum invited me to a public ‘discussion’ about education reform.  Though Matt seems to consider himself further to whatever direction ‘reformers’ are in the spectrum, I’m not so sure I’d place him there.  Still, based on the massive number of comments (72, though a lot are from Matt, himself) on the initial ’round’ of the discussion, many of the people who read and comment on the blog definitely see him as somewhat of an opponent.

Matt wrote a second letter and sent it to me over a week ago.  What follows is his letter and my response:


Thanks for your response. I hope I didn’t put you on the defensive too much. In fact, the reason I wrote to you is because of how much I respect your writing, as well as the fact that I respect that you’ve chosen a career as a teacher. I did it for just two years – I fully realize the challenges of the job, and some of the insights teachers understand about education policy that non-educators don’t.

That being said, I think you miss something fundamental in your response: the goal of teacher evaluation systems is not to make teachers “try harder,” a common straw-man argument. Evaluations systems are designed to reward and retain effective teachers, and support and dismiss ineffective teachers. That’s the theory; practice of it is much more difficult, of course.

The recent Times article, I think, should be a wake-up call to reformers that they must pair teacher evaluations with efforts to sustainably improve the teacher talent pool.

It’s not particularly helpful to think about getting rid of ‘bad’ teachers in the abstract. At my school, at least, there were perhaps one or two teachers who I thought were probably awful and should not be in a classroom with children. Likely, you would agree that they should be fired. And likely we’d both agree that firing those one or two teachers would not dramatically improve education quality. (Though I do think the gains would be meaningful and important, and while reformers surely overstate this value at times, traditionalists also understate how difficult it is to dismiss a bad teacher.)

A much larger group of teachers at my school, perhaps a quarter or a third, were truly fantastic. They were exactly what many imagine of a great teacher: dedicated, hard-working, inspiring, and life-changing for some students. These were teachers who I would love to have teaching my own (hypothetical) child.

The final group of teachers – which I counted myself as a part of – could neither be classified as ‘good’ or ‘bad.’ Almost all of us were hard-working, and cared about children, but weren’t so good that I think most parents would actively want to send their kids into our classrooms. (A note: Obviously my assessment of the proportion of awful, ‘good enough,’ and great teachers is purely anecdotal. I’d be curious your thoughts on this, Gary.)

This is why I say that we can’t think of such teachers as ‘bad’ in a metaphysical sense. We need some point of comparison, some better option. That, I think, explains why so few teachers are being rated as ineffective – because principals may realize that they can’t get a better teacher, so they go ahead and rank a good-enough teacher as effective.

(Another potential explanation – and one that I saw play out at my school – is an observer effect: teachers and students changed how they acted when a principal was in the classroom, sometimes leaving the evaluator with a mistaken impression of how effective the teacher was.)

I’m disappointed that reformers are not doing a better job focusing on increasing the number of teachers who are great rather than just good enough. Many really want to address this problem, but I don’t think there has been enough energy around creating a sustainable pipeline of excellent teachers. This is understandable, insofar, as this goal is difficult, and might only lead to results far down the line. It’s easier to focus on the small number of truly awful teachers than to address the larger, more important goal of supporting and attracting exceptional career teachers.

I guess my question for you, Gary, is whether you think it’s possible to recruit a teaching force that includes a larger number of great teachers? Do you believe, as I do, that great teachers can ameliorate poverty and change students’ lives? Do you accept my distinction between excellent teachers and ‘good enough’ ones?

Before I end, I want to briefly address a few other points.

o   No, the Chetty study has not been ‘debunked.’ Reasonable people can of course disagree about what the policy implications of the study are and can raise some legitimate methodological question (though of course an ideal methodology is likely impossible in such context). Consider Bruce Baker’s reaction and Matt DiCarlo’s reaction – both raise some interesting questions but neither suggests that the study is patently invalid.

o   I do not think a 1% increase in salary for a given year is small at all. Let’s say a teacher has 20 students in a class and those students, over a 40 year working career would average a salary of $30,000/year. If that teacher increases the average salary by 1% to an average of $30,300, that means the teacher has added a monetary value of $240,000 for each year teaching.

o   There’s now very strong psychological evidence that testing doesn’t just assess learning – it promotes learning. Though reasonable people can disagree about whether this logic applies to standardized tests, this phenomenon is certainly something that those who oppose high-stake test should engage with a bit more.

o   I believe that you’re making a false dichotomy in suggesting that a teacher can do well on standardized tests by avoiding critical thinking exercises. Indeed, I think critical thinking exercises will likely improve high-stakes test results

o   I absolutely agree with your point about ‘bubble’ kids. My school was obsessed with proficiency rates rather than overall averages, and I always argued against this view. However, I don’t believe this serves as an indictment of high-stakes tests; instead, it’s an important implementation issue that should be considered and can be overcome.

o   I think I have to point out that one of my chief complaints was how rarely research is cited in opposition to standardized testing…and then you didn’t cite a shred of research for any of your opinions regarding testing. You say, ‘In my research I’ve found that often there is not much of a difference between the two schools.’ I know you have a lot of experience in schools, but I hope you can understand why I don’t simply trust your intuitions and experience.

o   I myself do not see school closing as some panacea, and I realize that many students will just end up going to an equally bad, but farther away school. I do hope and believe that some may end up in a better school, and I don’t think that, here in Chicago, the city can sustain paying for a huge number of schools that under-enrolled. (I do realize that closing schools will not save money in the short-term, but will in the long term – to its credit CPS has been upfront about this. Another interesting note is that CPS says that it is not using test data to determine school closing, and is only looking at utilization.)

Alright. I think that’s more than enough for one letter. I’m definitely enjoying having a dialogue rather than a monologue. Thanks, and I look forward to reading your thoughts.



Dear Matt,

No, you don’t ever have to worry about putting me on the defensive.  Some of my thoughts are tough for me to explain, sometimes, so I might tend to over-explain so they can’t be misinterpreted, but other than that I’m completely comfortable answering any questions, no matter how tough.

I agree with your percentages about how many teachers are average, below average, and above average.  You seem to say that about a third are truly excellent, two thirds are just ‘average,’ and there are a small number, perhaps two or three percent, who are ‘bad.’  But this does depend on how exactly these categories are defined.  I think we’d agree on what constitutes one of those ‘bad’ teachers (and also agree that there are so few of those that a policy focused on identifying and terminating them isn’t going to ‘fix’ education), I think that the ‘average’ teacher is doing an admirable job and would have no problem having my two actual kids being taught by one of those.  Sure, I’d like them to have a few that I would consider ‘great’ from time to time, but I’m fine with the fact that most of my children’s teachers will fall into that ‘average’ category.

Like in Baseball where a ‘great’ hitter bats 300 and an ‘average’ hitter bats about 260, I think that great teachers are not as different from average ones as the reformer crowd claims.  When discussion about the importance of class size comes up, reformers often talk about how they’d rather have a great teacher with 40 kids in a class than an average teacher with 20.  Knowing what I do about how difficult teaching is, I’d put my kids in the small class with the average teacher over the ‘great’ teacher with the 40.

This is not to say that I don’t try to be a ‘great’ teacher each day that I go to work.  I’ve won various awards and written various books and articles about teaching, yet too often I’m up in front of my class, generally when I’m trying out a new lesson idea for the first time, and humming to myself to the tune of “We Are The Champions” something like “I am a failure, my friends, and I’ll keep on sucking till the end …”  My hope is that I have more ‘great’ days than ‘bad’ days that on average I’m considered ‘good.’

When I was younger I suppose I would have agreed with your belief that “great teachers can ameliorate poverty and change students’ lives.”  I don’t know that ‘great’ teachers (at least the mythical ones we hear about from TFA) are that much better at ameliorating poverty and changing students’ lives than ‘average’ teachers.  Maybe the difference is that I think ‘average’ teachers are better than you do, or that I don’t think that ‘great’ teachers are as good as you do.  Part of my belief has been shaped by my experience during my 2nd, 3rd, and 4th years of teaching when I was in TFA and at E. L. Furr High School in Houston.  This was the school that was recently on the front page of the New York Times in an article about how the school has finally curbed some of their gang related violence problems.

During the three years I worked there, I found the staff to be very impressive.  Yes, there were a few clunkers.  One was a man who was a year away from retirement and was, I felt, going senile.  Another was a woman who was an extremely hard working science teacher, but who had never really learned classroom management so all her hard work was wasted as students did not take her seriously.  But there were a lot of excellent teachers including the best English teacher I’ve ever seen.  I taught many eleventh and twelfth graders there and these were the top students in the school as many students never made it out of ninth grade.  Many of these upper classmen were, to use a TFA term, on or near a ‘trajectory’ to college.  If great teaching is supposed to help nudge them onto that college track, I was certainly in an excellent position to be someone who could have done that last push.  And despite my efforts and those of my co-workers I’ve come to learn, by keeping in touch with, or by recently reaching out to, my best students that few of them truly ‘overcame poverty’ or graduated college.  I don’t think that this means that we were not good teachers, or even great teachers.

And just because I don’t think that teachers have the power to do all that reformers think they do, this does not mean that I don’t think that teaching is a noble profession or that it is a very important one.  I guess a good analogy would be that I see teaching math not unlike teaching someone how to play a musical instrument.  The music teacher is not a failure if the student never becomes a professional musician.  Hopefully the music teacher will encourage the student to enjoy music and to want to practice and get better.  Also there’s the very important relationship that the teacher has with the students in which other life lessons can be taught, aside from music.  I was fortunate to have such a music teacher who I took private trumpet lessons with from 4th to 12th grade.

But if you want me to say it more directly, yes, ‘great’ teachers are better than ‘average’ ones who are, in turn, better than ‘bad’ ones.  Everyone knows that.

The issue, though, is how this should drive policy in such a way that we maximize whatever the purpose of schooling is?  Of course this ‘purpose’ would have to be defined first and though my definition might be a lot more holistic than yours, I would imagine that you are opposed to one that relies too much on standardized test score gains.  Then again, everyone nowadays claims to agree with this which is why they always stress ‘multiple measures’ in teacher evaluation.  But when I hear numbers being thrown around like 50% for value added or even 30% (no reformer I’ve read about dares to suggest a number less than 30%) I get very uncomfortable.

I do not think that putting this much weight on very inaccurate metrics will make schools better.  My belief is that if this pattern continues, it will make education actually worse, which is why I spend so much time reading and writing about this.  It hasn’t worked in D.C. because it is based on a false premise.  It seems like reformers are banking on the idea mentioned in Waiting For Superman that if we could just fire the bottom 5% of teachers each year, achievement would soar.  The only way that this would work is if there were a significant number of other teachers who need to fear for their jobs in order to get motivated to do their best teaching.  I just don’t think that this is the case.  Treating the wrong disease can be very dangerous.  It would be like a doctor prescribing chemotherapy for the flu.  It is misguided, painful, and will likely make the patient sicker.

I very much agree with your point that more effort should be dedicated to making the vast number of ‘average’ and even ‘great’ teachers even better.  This is where organizations like StudentsFirst have completely missed an opportunity to use all that money to truly improve education.  Teachers can definitely benefit from having more resources, professional development, truly usable lesson plans and activities online.  I personally just attended the NCTM (National Council of Teachers of Mathematics) conference where I attended workshops that I hope will make me a better teacher.  There were 20,000 math teachers there, which is a small percent of the math teachers in the country.  Most people paid over $1,000 to go to the conference, but maybe if there were more much cheaper opportunities to get good professional development, more people would participate.  Teachers want to improve.

I do not see ‘teacher evaluation’ reform, in the direction that it is currently going, as a way, as you describe, of helping teachers improve.  In New York teachers don’t even get to see the tests so how is that supposed to help them improve.  I personally have learned very little about how I can improve by analyzing the results of my own students on standardized tests.  Generally what I learn is that when someone else writes a very bad question about some topics, then even students who understand that topic quite well could get the question wrong.  These tests are just not good enough and to make them good enough, I think, would not be an efficient use of scarce resources.  As newspapers gush about the common core standards and assessments, all I see is hundreds of millions of dollars going to Dell and Apple to get the schools tech up to date to administer these tests.  As you wrote about the lavish cost of TFA and how this is not a good use of funds, I feel that way about the trend to keep teachers honest by making them ‘accountable’ for their standardized test ‘gains.’

Yes, teachers can learn to teach better.  I certainly don’t think all teachers (myself included) are perfect.  But the witch hunt for teachers ‘bad’ teachers is, in the long run, going to dissuade people from becoming teachers.  This will, in time, lower the quality of education in this country.  That’s what I’m afraid of.


For the next part in this discussion, click here.

This entry was posted in Discussion With Matt Barnum. Bookmark the permalink.

91 Responses to My discussion with Matt Barnum Part 2

  1. CarolineSF says:

    The entire point about teacher evaluations is invalid. ASdministrators already know who the “bad” teachers are, and if they don’t, the bigger problem is with the administrators.

    I had the opportunity to ask Eric Hanushek (Google him, anyone who doesn’t know) at a public event if administrators didn’t already know who the “bad” teachers are, and he readily admitted that they do.

    So that entire line about the need for teacher evaluations is just plain bull****. The intent and effect of the current fad in teacher evaluations is to punish, demoralize and intimidate teachers, especially those who work with the highest-need kids.

    MERIT PAY is what reformers claim is supposed to make teachers work harder. It’s just dishonest to base an entire lengthy dissertation on debunking the supposed claim that evaluations are supposed to make teachers work harder.

    Also, in keeping with my belief that nobody actually believes this stuff who isn’t paid to believe it, can you please refresh our memories as to how Matt Barnum is remunerated by the “reform” forces to make these statements?

    • Matt Barnum says:

      I think most administrators have some good idea who the “bad” teachers are – but administrators’ views are probably imperfect and test data can supplement, complement, or inform principals’ opinions about teachers.

      • skepticnotcynic says:


        Your statement is valid, if you are neither competent or experienced enough to make a determination of who is an effective teacher. When I was a 3rd year, a colleague in my English department who taught 11th grade was literally the worst teacher I have ever seen, yet she had the highest test scores in the department. She slept half the day and showed a movie virtually every day. I don’t think she taught more than 10 days all year. Do you know why her students had the highest scores as well as growth on the high-stakes exam? I hope you’re smart enough to figure this one out.

      • CarolineSF says:

        If test data didn’t correlate so closely with demographics. Apparently this can’t be said too many times: It amounts to rewarding the teachers who teach the privileged and punishing those who teach the high-need. Given that — and, again, the fact that even “reformers” acknowledge that competent administrators are well aware which teachers are “good” and which have problems — the obsession with evaluating teachers based on test scores ranges from clueless to dishonest and willfully destructive.

      • Matt Barnum says:

        Well, that’s why you look at growth and use value-added metrics to control for this. (Yes, I know VAM is somewhat volatile – but so are many performance metrics. And VAM should only be used as a part of the evlauation.)

        I’m also a big believer in paying teachers who work in high-needs schools more.

      • CarolineSF says:

        I might have thought it was that easy to control for demographics too, but when the Oakland school district was discussing evaluating teachers based on test scores, the district’s line in response to the issue that teachers would be punished simply for teaching high-need students was basically “oh well — win some, lose some.” So now I’m pretty convinced it’s not possible to do that effectively and it’s la-la land to be so blithe about it.

      • skepticnotcynic says:


        I thought you were smart enough to figure what I said out, but apparently you aren’t. The teachers VAM data was better than the entire department, yet she slept through half her periods. She taught the advanced kids, and they grew significantly more than the hardest to teach kids, because it was an exit-level exam. They ended up trying much harder on the exam that year. As a teacher, why should I have to rely on a metric that is so inaccurate year-over-year? The high stakes exam students are required to take is not a test designed to measure teacher performance. It’s clear you do not understand the basic fundamentals of psychometrics.

      • Matt Barnum says:

        I can’t speak to either of your anecdotes. The fundamental idea of VAM is to account for student characteristics and level of the playing field ( I’m not sure what the evidence is that VAM fails to do so, but I’m open to it.

        Skeptic, do you believe that any test can be designed to measure teacher performance? If so, what would that test look like and how would it differ from the high-stakes tests we see now?

      • gkm001 says:

        Mr. Barnum, thanks again for participating. Here’s how ASCD characterizes some of the research on VAM:

        “RAND researchers examined whether giving students different tests would lead to different conclusions about teacher effectiveness (Lockwood et al., 2006). They calculated value-added ratings of middle school teachers in a large school district on the basis of their students’ end-of-year scores from one year to the next on two different math subtests. They found large differences in teachers’ apparent effectiveness depending on which subtest was used. The researchers concluded that if judgments about teacher effectiveness vary simply on the basis of the test selected, administrators should use caution in interpreting the meaning of results from value-added measures.

        “Researchers have also identified other threats to the trustworthiness of value-added measures. Goldhaber and Hansen (2008) looked at the stability of such measures over time: Do value-added analyses identify the same teachers as effective every year? Using a large data set from North Carolina, they found that estimates of teacher effectiveness were not the same across years in reading or math.”

        Lockwood et al.:
        Goldhaber & Hansen:

        For more studies and a lengthier critique, see “Problems with the use of student test scores to evaluate teachers,” a paper published by the Economic Policy Institute:

      • Matt Barnum says:

        Thanks for passing this along. I fully realize that VAM can be quite volatile. My preference actually would be to give students three or four high-quality, high-stakes tests a year and then measure the VAM from all of them. That would go a long way towards reducing its volatility. Of course it would depend on having high-quality tests (and I know many would resist this because they already feel that students are overtested).

  2. meghank says:

    I really wish Mr. Barnum had responded to the point you often make that the tests are very poorly made, and contain multiple mistakes. I can cite plenty of articles exposing various mistakes Pearson has made if that’s what he means by “research,” but I’m sure he’s seen them as well. What does he propose doing about the poor quality of the tests? Or does he think the test-makers (Pearson and others) will improve on their own volition?

    Or does he disagree with you and most others who have seen the K-8 tests and think the tests are well-designed now?

    I would still be opposed to high-stakes testing if the tests were high-quality, but, with the level of power that the test-makers have attained, I think improving the test quality is now an insurmountable problem.

    By the way, no research can or will be done on the poor quality of the K-8 assessments, because most state laws threaten teachers and administrators (the only adults who see the tests) with loss of licensure for talking about anything connected with the test questions. I was told this would happen if I said, “Those questions were hard,” without even giving examples. So how can research be done?

    • Matt Barnum says:

      I honestly don’t know how widespread mistakes are on tests. Sure, as a teacher I encountered some poorly worded questions, and in the case of the district-created exams, some flat-out awful questions, but I did not see this as all that widespread of a problem. Several articles about bad test questions or passages don’t really tell us what proportion of questions are valid ones.

      Again, I’ll cite the Chetty research – if the tests are so bad, then we should not see teachers who produce good test scores affecting real-world outcomes.

      • Tom Hoffman says:

        One thing about Chetty is, 1% is a lot compared to what? For example:

        “A 2011 study by the Berkeley public policy professor Rucker C. Johnson concludes that black youths who spent five years in desegregated schools have earned 25 percent more than those who never had that opportunity. Now in their 30s and 40s, they’re also healthier — the equivalent of being seven years younger.”

        Or the wage effects of deregulation:

        “On the eve of dereg, hourly wages in transportation and warehousing were about 38% above average, where it had been for years. As soon as regulations were lifted, however, the averages began a long slide that continues to today. That wage premium has now disappeared completely. The pattern in trucking since the data begins in 1990 is pretty similar, going from a 32% premium in 1990 to a 4% discount today. And working conditions have gotten inexpressibly worse—longer hours, fewer benefits, less security.”

      • Matt Barnum says:

        Interesting point. Though if we were able to improve a student’s four core teachers every year for five years, that would result in a 20% increase in salaries. That may not be feasible, though desegregation has also proved very difficult. I myself strongly favor socioeconomic integration where possible.

      • meghank says:

        You tell me there’s not enough research showing the tests are poorly made – but I told you why there isn’t enough, and why there won’t be.

        If Pearson has too much power to be criticized, we need to get rid of the high stakes testing altogether,

      • Matt Barnum says:

        No. I actually cited research that went against your claim. The Chetty study can’t be squared with the idea that tests are awfully designed and meaningless.

      • Dan McGuire says:

        The Chetty study doesn’t prove anything of the sort. The Chetty study is a ‘Maybe’ at best.

      • Matt Barnum says:

        Really? I’m not sure how. I acknowledged to Gary potential problems with the study, but if the study is right then I don’t see how tests aren’t measuring meaningful learning…

        Also, the evidence is very strong that the SAT does measure something meaningful (see my first letter to Gary). Is there some reason to believe that the College Board is uniquely good at designing standardized tests?

      • meghank says:

        The Chetty study has numerous methodological problems, all of which have been mentioned on this blog. You are going to have to come up with something better than that one study you keep citing.

        I enjoyed my earlier conversation with you, but now I’m starting to re-think things. Did you really say Diane Ravitch was “once-respectable”? Or was that someone else on that article with your name on it?

        If you did say that, I’m not sure I want to continue to engage in conversation with you. It seems like you’re just trying to create PR for the “reform” movement, and you’re not actually open to changing your mind.

      • Matt Barnum says:

        I didn’t write that about Diane Ravitch – it was an introduction to my piece, written by the editor of the blog. (I myself probably wouldn’t have written something like that, though I share the view that Dr. Ravitch has lost a great deal of respectability. I’ve read two of her books, both of which were very good, but I can’t stand the endless barrage of ad hominem attacks that emanate from her blog.)

        As to only citing one study, I also offered multiple studies in my first letter regarding the value of the SAT. Sure, state tests aren’t the same – but why would we think that Pearson is somehow much worse at writing tests than the College Board?

      • meghank says:

        “Why would we think that Pearson is somehow much worse at writing tests than the College Board?”

        Because they are?

      • meghank says:

        And, I’ll bite, “ad-hominem attacks” – against whom?

        Other than Michelle Rhee. I mean, come on, Michelle Rhee has lied on national television. I saw her do it myself and saw in her eyes that she knew she was lying (I’m referring to her appearance on the Colbert Report). The woman IS a liar, and I don’t consider it an ad-hominem attack to point that out.

      • meghank says:

        Also, although I will say Michelle Rhee is a liar, I would not say she is not “respectable.” To say of a woman, “She is not respectable,” has some misogynist connotations.

      • Matt Barnum says:

        Does it have misogynist undertones to call a woman an ‘ice queen’? (

        Other ad hominem attacks by Diane Ravitch:

        – She suggests that the phrase ‘the achievement gap’ is used “cynically by self-proclaimed “reformers” who have no genuine interest in closing the opportunity gap or the wealth gap.” (

        -She calls someone who wrote an article she disagrees with is a ‘money-grubbing entrepreneur’ (

        -She suggests that the ‘real goal’ of reformers is re-segregation. (

        -She refers to philanthropists who support education policy she disagrees with as members of the billionaire boys’ club. (

      • Kt says:

        “But why would we think that Pearson is somehow much worse at writing tests than the College Board?”

        1) Pearson makes a lot of money selling both the curricular materials and the test — I realize that the College Board also produces prep materials, but I mean the actual school curricula. Pearson can make money by aligning their curricula to their tests, ensuring that any district NOT using Pearson materials is at a disadvantage. (

        We COULD give Pearson the benefit of the doubt and say this was an egregious and uncharacteristic error, but they have a history:

      • Kt says:

        “The Chetty study can’t be squared with the idea that tests are awfully designed and meaningless.”

        Setting aside issues with the Chetty study, I don’t think anyone is suggesting that it’s no longer true that kids who do well on tests have advantages in their future earnings. (On the other hand, I was a great test taker and my earnings as a teacher have definitely not beat out those of my former classmates!) And from 11 years in the classroom, I’ve seen that the kids who are really at the top of the group tend to do better, on average, on classroom assessments even when I’ve made a mistake. I make typos all the time, or write poor questions (or use a textbook company’s assessment that contains poor wording). On the whole, the kids who are always at the top of the class still tend to outperform the kids at the bottom of the class. That’s going to (more or less) be true whether the test itself can be considered “great” “good” “ok” or “terrible.”

        However, when we begin parsing things much closer, and making very high-stakes decisions based on these tests, the small nuances begin to matter more — maybe the kid who is top of her class in 6th grade (and therefore needs to be basically perfect on the 7th grade test in order for her teacher not to be in trouble) gets a worse set of questions in 7th grade and does slightly worse than she “should.” She may still be outperforming other kids and therefore be likelier to experience success in life (of COURSE success = earnings, why wouldn’t it???) but that fact doesn’t speak to whether or not the test was good, mediocre, or awful. So I don’t think Chetty says much of anything about whether the tests are good, except to say that the tests are not written in literal gibberish, or are not a pure measure of blind luck. No one’s suggesting that NONE of the questions are any good — just that the bulk of the test is not GOOD ENOUGH to warrant the kind of obsessive focus on them that we have.

      • meghank says:

        You responded to one comment of mine (so I know you are still reading this) but not the other: who has Diane Ravitch maligned in an ad hominem attack?

        I would also like you to respond as to whether you feel calling a woman “not respectable” has a misogynist connotation. (It also seems to me that that in itself is an ad hominem attack.)

      • Matt Barnum says:

        Kt – Interesting take. But the Chetty study doesn’t just say that teachers with high-performing students produce higher salaries, but that teachers who produce *gains* on standardized tests produce salaries. If the tests are that bad, then yes a very bright student would do well, but I wouldn’t expect him to gain or lose points on the test based on the quality of the teacher (or see those gains or losses reflected in future earnings).

        So in the scenario you described, if it were true that the test was not measuring what we wanted it to measure, I don’t think we’d see the sort of results that Chetty produced.

      • Matt Barnum says:

        Meghank – Above I linked to multiple instances of ad hominem attacks by Diane Ravitch.

        No, I can’t say that I find ‘once-respectable’ as misogynistic. But, like I said, I wouldn’t call someone that in any event.

        Do you mind sending me any link about how Tennessee evaluates teachers based on students they didn’t have?

        Finally, my point about Pearson vs. College Board was that if the problem is the test design company, that’s a great reason to find a new company – but not a great reason to criticize testing broadly.

      • Louisiana Teacher says:

        There is an abundance of evidence that Pearson has committed egregious testing errors.

      • meghank says:

        Also, having read the test to a special ed kid, I can tell you what proportion of questions on the third grade math TCAP are valid ones – but I’ll get fired for telling you.

      • meghank says:

        You probably wouldn’t like what I have to say about Michelle Rhee, either. I believe she fits the textbook definition of the psychopath (look it up). It’s certainly an ad-hominem attack to call her an “ice-queen” and it certainly has misogynistic connotations, but society looks much more askance at calling her what she actually is, which is a psychopath.

        As for the other attacks you listed, they seem relevant to the arguments. “Ad hominem reasoning is not always fallacious, and … in some instances, questions of personal conduct, character, motives, etc., are legitimate and relevant to the issue, as when it directly involves hypocrisy, or actions contradicting the subject’s words.”

        You did say Diane Ravitch “has lost a great deal of respectability.” Now that I’ve termed Michelle Rhee a psychopath, you probably don’t find me respectable and no longer are interested in arguing with me. (That’s fine with me.)

        It may be a great reason to find a new company – but you need to look more realistically at the level of power Pearson has and the unlikelihood of pushing them out in favor of another company. Trying to imagine the enormous amount of money that would have to be spent on lobbyists to work against Pearson makes one’s head spin.

        They claim in that link that this year there will be other options for non-tested teachers, but, to give you an update, that has not happened, and non-tested teachers are still forced to use school-wide growth scores.

      • jcg says:

        All tests have error but tests with “wording problems” indicates major threats to validity that cannot be brushed off. High error rates mean test scores are invalid. Plugging an invalid score into the VAM formula does not make these results “more” valid. Anyone schooled in psychometrics understands the deep flaws in standardized tests and their limitations in measuring the broad construct we call “learning”.

        More damaging, is using tests for purposes other than for which they are designed, like evaluating teachers. No magic statistical formula like VAM will change this invalid and fraudulent misuse.

        Standard scores (SS) tell us nothing about teacher specific effects. They cannot identify teacher quality nor do they pinpoint the broad contextual influences on learning and development. A standard score indicates what the child knew on that day, at that time, under those conditions h/she took the test compared to same aged peers. Period.

        It’s time non-educators stop attributing outsized interpretations of standard test scores. They don’t tell you what you think they do. VAM doesn’t fix that fact.

        It’s time to admit you are wrong to overstate the meaning of standardized test scores.

  3. Tom Hoffman says:

    Yes, 15 years ago I thought “Well, if these particular tests aren’t good enough, that’s not really a definitive argument against standardized tests in general — I’m sure they’ll get better!”

    As it turns out, maybe some got better, some got worse, and mostly it is the same thing.

    At a certain point, you stop having faith that *exactly the same companies and people* will suddenly be able to create better tests, particularly as you learn more about the technical, legal and financial constraints on the process.

    • KrazyTA says:

      Tom Hoffman: IMHO, your brief response is short on words and rich in meaning.

      I am interpreting your remarks a bit, but I take some of them to mean that 15 years ago you thought that things could/would/hopefully get better re standardized testing—and yet you now think that nothing has really changed, either for the better or the worse. My own tipping moment came very recently. Sometime last year I read Todd Farley’s MAKING THE GRADES: MY MISADVENTURES IN THE STANDARDIZED TESTING INDUSTRY (2009). A few days ago I finished Banesh Hoffman’s THE TYRANNY OF TESTING (1962). The earlier publication anticipates the later one, often in [uncomfortably] small details. They are both slim, inexpensive, and meant for a general reading public, and best of all, written with verve and humor. They almost seem like Parts 1 and 2 of a unified work. In a severely short summary of both: the problems now were the problems then, neither more nor less, and nothing has really changed, either for the better or the worse.

      The increasingly high-stakes standardized testing industry has had half a century from the publication of Hoffman 1962 to Farley 2009. Yet if you read Nicholas Lemann’s THE BIG TEST: THE SECRET HISTORY OF THE AMERICAN MERITOCRACY (2000), you will note that Hoffman was considered by the heads of Educational Testing Service (see pp. 100-102, 221) as a cranky eccentric, a possibly brilliant but oddball professor, obsessed with peculiar notions about what standardized testing could and couldn’t measure and what affect the increasing use of such testing would have on genuine learning and achievement.

      My own take on these three works: the high-stakes standardized testing industry has demonstrated it has not, will not, and cannot reform itself from within. There is just too much money, prestige, and ego at risk.

      Thank you for your comments.

      And sincere thanks to Gary R for hosting important discussions like this. Krazy props to an honorable numbers/stats person. In a world where it is still true that “Facts are stubborn things, but statistics are more pliable” you try to “Always do right. This will gratify some people and astonish the rest” [both quotes by Mark Twain].

      Color me gratified and astonished.


      • Tom Hoffman says:

        Yes, Farley’s book made a big impression on me too, particularly because I happened to read it immediately before the first big Common Core draft came out, and it immediately seemed to me that all the ELA standards were written very specifically to make the constructed responses more easily gradable by computers or temps.

  4. A Texas Teacher says:

    Matt asks for research. I don’t know why the focus on research when there are real results. I said this at the end of the last letter:
    We already have the evidence that ever increasingly high stakes tests do not benefit students. Texas has been administering state assessments to students for over 20 years. The stakes get higher each year (and the tests get ‘more rigorous’ ever six years) until this year, 2013 when high school students need to pass nearly 15 tests to graduate. What has all this (90 billions of tax dollars) gotten for the students? Not much. SAT scores are still near the bottom of the national pack. There is the proof. Over 20 years worth of it.

    Any response, Matt?

    • Matt Barnum says:

      I’m not quite sure how good or bad Texas is – its NAEP scores aren’t too bad relative to states with similar poverty rates. (See:

      More to the point, I think it’s ridiculous to say – even assuming Texas has poor schools – that Texas scores are low, Texas has many tests, ipso facto tests are bad. Again, see the Shanker Blog mocking this sort of logic (which both reformers and traditionalists are guilty of):

      • A Texas Teacher says:

        Let me put it another way: Texas has spent trillions of dollars on tests high stakes, Pearson made tests in the last 30 years—compared to Florida, California and New York, did we spend that money well?

        Where is the evidence that these tests resulted in higher achieving students, better citizens….etc?

  5. E. Rat says:

    I agree with Caroline above: if the principal cannot identify teachers who aren’t suited to that school, the the school has bigger problems than those teachers.

    I am suspicious of the belief in bad teachers, particularly how they cluster together in bad (low-performing schools), emanating a miasma of bad skill, low expectations, and child-hate. I think teaching in a high-needs school is different than teaching at a low-needs one, and that plenty of teachers who are good in the suburbs would be bad in the inner city (and vice versa).

    Because when reformers talk about bad teachers, they don’t seem concerned with those in wealthy areas. I don’t think it’s because teachers there are better; I think it is because reformers are conflating test scores and teaching to the detriment of both.

    I also have to observe that I personally take Mr. Barnum’s opinion on teaching quality with a grain of salt. I doubt that a teacher with two years’ experience is the best judge of his peers. Unless his school district heavily funded release to observe peers, his direct observation of these teachers can’t be that significant.

    Moreover, I have noticed that Teach for Americans tend to equate classroom performance with how many hours a teacher stays after school, how important a teacher believes test scores to be, and how a teacher talks about education (whether he or she uses popular buzzwords). The reality is that veterans don’t need to put in the hours, that there are plenty of valid reasons to doubt that test scores tell us very much, and even more reasons to disdain jargon.

    • Matt Barnum says:

      Just to respond on one point: I agree, I take my own observations on teachers’ quality with a grain of salt too! Though, to be fair, a lot of it was based on the consensus among other (veteran) teachers in my school. But absolutely it can be difficult to tell the quality of other teachers in your own school. That’s why I asked Gary if my estimates jived with his experience.

  6. B says:

    “The reality is that veterans don’t need to put in the hours”

    Coming from a 2nd-year, this is indisputably true. I’m an incorrigible workaholic, but there are veterans who are out the door at 3 who can walk in at 7:59 and smoke me when the bell rings.

    Effort=quality is one of the biggest misconceptions broadcast by TFA.

  7. Lisa says:

    I have yet to understand why reformers are convinced that great teachers are just waiting to enter a system that will assume they are bad teachers until proven otherwise? Add to the the reality that any teacher is only great so far as that year’s test scores show, and it’s hard to think of a more demoralizing work environment. New teacher evaluation systems are added and critics leap to cast doubt on even teachers rated highly effective. With that kind of a system in place, not to mention the fact that it is not well-paid and many of the same reform crowd wants to reduce or eliminate benefits for those teachers, its pretty easy to see that anyone with talent will find a more rewarding profession. (Yes, teaching is inherently rewarding, blah blah blah, but quite frankly, I think we al know that it’s completely demoralizing to hear day after day, year after year, that teachers are overpaid and incompetent.)

    Gary, your points about professional development opportunities have also been voiced by teachers in my district. The district is currently running a merit-pay pilot study and teachers in both parts of the study said the thing they found valuable were the professional development opportunities, additional teaching resources (including master teachers and regular meetings to discuss specific teaching challenges). The teachers who would receive merit pay said the same thing: the pay is nice, but what makes the difference are the resources that allow us to make our teaching better because they had help, could see the difference it made, and that made all the difference.

    • gkm001 says:

      Lisa, you are spot-on.

      Even if we were to grant the Chetty study as evidence that high value-added scorers are the best teachers, it does not necessarily follow that evaluating teachers based on value-added measurements would help to attract or retain good ones. This is merely an inference, and here is why I think it’s flawed:

      * The Chetty study was post-facto. The teachers in it were not, in the course of their jobs, evaluated for their VA scores. We don’t know whether test-based evaluation would have made their teaching (either the “best,” “average,” or “worst” teachers”) any better. It is possible that it would have made it worse.

      * I am not aware of any evidence that merit pay for better test scores boosts test scores, except in districts where teachers cheat (!). The economist Roland Fryer has found that merit pay may actually decrease student achievement:
      Threatening people with the loss of their jobs if they don’t achieve a certain increase in scores is not the same as merit pay, but it’s a kind of mirror image: high test scores = bonus, low test scores = fired.

      * What do smart, talented people want in their jobs? Autonomy, collegiality, the opportunity to be creative in their work. What does high-stakes testing do? It prompts districts to take away these things. Perhaps districts ought to respond differently to high-stakes testing, but they don’t. Plus, as Lisa points out, who wants to go into a profession so consistently derided in the media? Who wants to go into a profession where badly designed tests, taken by someone else, determine your performance rating? What evidence demonstrates that value-added evaluations or promotions would attract a better breed of teachers? How do we know it wouldn’t actually drive them away?

      * What evidence is there for the presumption that teaching ability is fixed, and that the only way to get better teaching into the schools is to hire different teachers? How do we know this? How do we separate the quality of teaching from the quality of the school: its culture, its schedule, its material resources, the quality and quantity of professional development and coaching offered, the quality and quantity of time for planning together with colleagues?

      Everyone loves movies like “Stand and Deliver,” where a teacher takes whatever students happen to show up, and teaches them to excel at something. Well, can’t schools do this with teachers? If you want to raise the quality of the average teacher, building the expertise of the current teaching force seems quicker and more efficient than waiting for a crop of new teachers to come in, especially given that there is no guarantee they’ll be any more skilled or talented than the ones who left.

      The questions we should be asking are not how to fire bad teachers (as Gary and Matt agree, there aren’t that many — perhaps because teaching is a demanding job and the truly incompetent tend to self-select out), nor how to attract new and better ones. We should be asking what good teaching looks like, and how we can recognize it and expect it and coach for it. We should think of teachers as learners, because most of them are, and because that is what we most want them to model for their students.

    • Jack says:

      Randy Turner, a veteran teacher in Joplin, Missouri, posts an opnion on the HuffPost regarding the destruction of the teaching profession, ADVISING YOUNG PEOPLE NOT TO GOT INTO TEACHING, CITING—

      … TFA replacing veterans;

      … legislation banning / removing tenure;

      … making ony $37 K-per-year after 14 years, then being publcly shamed for being “greedy”…

      and on and on…

      HERE it is at the Huffington Post:

      then gets immediately suspended for expressing his 1st Amendment-protected opinion:
      (quite a civics lesson for the students in Joplin, MO, doncha think?)

      Here’s Randy’s blog with the latest on his own situation:

      and if all this is not bad enough, some ignorant, profiteering edupreneur chimes in. The guy admits that “as an education entrepreneur, I do not claim to understand every nuance of the classroom. I am not a teacher..”, but that doesn’t stop him from rubbing salt in Randy’s wounds with this atrocity (also posted at the HuffPost):

  8. skepticnotcynic says:

    This whole debate about who is a poor, average, good, or great teacher misses the mark. There are teachers in my building who I would consider average to below average instructors; however, I would not quickly dismiss them, since I know they add value to the building in other ways. For example, one of them is a great athletic coach, another is a fantastic motivator who can connect with some of our most difficult students where others can’t. If I evaluated this individually solely on their classroom performance they would be fired. I have also seen young inexperienced teachers who are incredibly bright, work very hard, and know their content extremely well, and have decent test scores barely connect with any student. Why? Because they were more concerned with their own personal achievement than their students. And treated the students more like data points than human beings.

    As a young teacher, I would’ve been quick to write off the weak teacher, but having spent quite a bit of time in schools, I can now see potential where I couldn’t before. I also know that young hard working teachers overestimate their efficacy. I know, I was one of them.

    As a TFA corps member I flashed my data around like I was Jaime Escalante (not even close), but it didn’t matter, I was put on a pedestal by my PD. A fews later, I realized I was a mediocre teacher, even though I had good test scores that showed growth in my classroom my first couple of years.

    Teaching quality is so much more than a debate about high-stakes testing, which is precisely the problem. Instead of focusing on how we can build great teams of teachers and provide quality professional development and collaborative learning environments that encourage educators to address students in multiple ways, we are pigeonholed by such a narrow and uninspiring metric: How well our students do on standardized tests.

    • KrazyTA says:

      skepticnotcynic: I wish I could have written your posting.


      This exactly matches my thinking re working with teachers as a bilingual TA [elementary] and SpecEd TA [HS]. One of the most effective teachers I ever worked with voluntarily took on every ‘tough kid’ in her elementary school. She accomplished more with them than anyone else at her worksite. Administrators, teachers, aides, and everyone else was so so glad that she was in the building. You’ve heard the expression “a rising tide lifts all boats”? She was what helped make the tide rise!


      Her test scores? I bet they would label her an extremely ineffective teacher in today’s data-drivel [not a typo] instructional environment.

      “Not everything that can be counted counts, and not everything that counts can be counted” [variously attributed including to Albert Einstein]

  9. Dan McGuire says:

    It appears that Matt Barnum is much like the 30 something, MFA alum, Director of Teaching and Learning of one of our local large districts who was described by a 30 year veteran of that same district as “not encumbered by either experience or alternate views.” Standardized tests don’t work well to measure student achievement and they don’t work at all to measure teaching effectiveness. Assertions that there are valid connections between standardized test scores and teacher effectiveness demonstrate a profound lack of understanding of both standardized tests and teaching. Lots of people would really like it if there was a causative connection because then understanding and talking about and effecting change in teaching and learning would be a whole lot easier. Sorry, teaching, and evaluating teaching, and changing teaching is not that easy. Lots of people think they understand teaching and learning based on their own experience as students or as parents or after a few years in a classroom or after having read some articles on the web or in newspapers or watching a political documentary movie, or even, a collection of the above. Once again, though, it’s not that simple. Teaching is very complex; teaching in large systems is even more complex. Teaching children in our society on a large equitable scale has not yet been accomplished. We just haven’t stepped on that moon, yet.
    I’ll put my money on the experienced teachers to get there first.

  10. Stephanie says:

    Matt, I’m wondering if you think it is important to note that the data used in the Chetty study comes from a time when the stakes attached to standardized tests were much lower. In my opinion, this makes a huge difference. Too often, the results of that study are used to support the simplistic notion that high test scores (not the overall process/curriculum used to achieve those scores) lead to a better life. If we follow that logic, it makes narrowing the curricula and teaching to the test a perfectly acceptable form of education as long as it boosts tests scores.

  11. Jennie says:

    Thank you for this. I always enjoy these debates.
    I think you’re very right in your assessments of “great,” “average” and “bad” teachers. You’re also totally right about dissuading people from becoming teachers by this witch hunt. It also encourages good teachers (I think I was good, probably better than average but probably not “great”) to leave the classroom as soon as they get a chance…like I did, and I happen to know several good teachers who are actively looking for a way out. You don’t mention that in trying to recruit and retain a pool of better teachers, we should be compensating teachers better. While it is true that nobody goes into it for the money, and that we don’t necessarily want would-be hedge fund managers as teachers, it’s also true that studies show that a small fraction of college students would consider going into teaching BECAUSE of the pay, and WOULD consider doing it if the pay were better. I am speaking from experience when I say that pay is not #1, but it does matter. I got very frustrated seeing how long it would take me to ever get to a decent salary. My salary also decreased by $5K from one year to the next when they changed class size rules and just piled more kids into my classes so they wouldn’t have to pay me to teach an extra period. I ended up with more work and less money. It was discouraging, to say the least, and I saw more of that on the horizon. It was not the main reason I decided to get out, but it definitely contributed. At the same time, the “reformers” promote increasing teacher pay–through so-called “merit pay.” This did not appeal to me (or convince me to stick around) and I don’t think it will appeal to anyone else (unless they’re naive and/or planning on leaving in a short amount of time). Because we know the metrics it’s based upon are not within our control, for the most part, it’s like playing the lottery with your salary every year…only if you don’t win, you might also lose your job…or, depending where you live, get a nasty write-up in the local newspaper. No thanks.

    Finally, I want to come back to why “great” versus “good” teachers is an entirely subjective distinction. I know that I was a favorite teacher to several students throughout the 6 years I spent teaching, and I am still in touch with those students, and happy that I can still offer them a certain amount of guidance as they go through college and choose careers. (I advise all of them AGAINST going into education.) But I know that there were also students who never got much of anything out of my classes. The material just didn’t speak to them, and maybe they didn’t get into my personal teaching style, though I don’t think there were many who actively disliked me. In high school, my favorite teacher was my French teacher…this is undoubtedly why I ended up majoring in French in college and becoming a French teacher myself. Still, I knew that most of the kids in my class didn’t like our French teacher all that much, made fun of her a lot, and really didn’t like the class itself. They probably would not look back on her and think she was a “great” teacher. To me, she was and always will be the best.

    Just like we don’t all choose the same friends, enjoy the same movies or books, or want to do the same activities in our free time, we don’t all find the same teachers “great” and they don’t all have an equal effect on us, no matter how much time and energy and talent they put into their craft. It’s just the way it is. One person’s “great” teacher is another’s “average” teacher, and vice versa. And yeah, there are a few duds. But there are already systems in place to get rid of them. Administrators choose not to do it because they don’t want to spend the time observing them and documenting the problems. Many of them never should have made “tenure” in the first place, but again, administrators did not do their due diligence and then ended up with these duds. But there aren’t that many of them, and getting rid of them, while it should be done, would not unilaterally improve the overall quality of education.

  12. Lynn Allan says:

    I enjoy reading all of this reform debate. I have been teaching Biology for 23 years. Love my job, love the students, receive love and respect from my students, work with new teachers, respectable test scores, teach part time in a credential program….I have been able to maintain a positive attitude at a school that has had 38 assistant principals, 8 principals and 3 superintendents in the last 9 years because I understand that I only have power inside my classroom. I reached the top of the pay scale 5 years ago, and have actually experienced a decrease in wages due to furlough days. We survive as a school due to the dedication of the teachers who commit to the school and the community. One third of the entire staff will cycle through every three years. Why? Because its HARD WORK!!! Eventually, the new teachers will want to settle down, partner up, have some kids of their own, and try to balance saving the world with having a family life. Not to mention….the big decision about where YOUR children will go to school. Maybe they will actually want to be treated professionally instead of a pawn in the newest education reform game. Bottom line…Students come to school every day and they deserve a decent teacher that cares about them. The real revolutionaries in education have always been the teachers with the guts to stay in the classroom, be the club advisor, coach the team, run the program, raise the money, be on the committee, parent outreach, counselor, friend, referee, and part time parents to these students. These things will never be measured on a test.

  13. Steve M says:

    Two things, Matt:

    Are you aware that more than 80% of the schools Chicago has closed over the last year have been reopened as (more expensive to operate) charters?

    Of course you are, you’re in Chicago! What you know fully well, but fail to discuss, is that the primary objective in closing Chicago schools has been to eliminate the teacher union and teacher tenure.

    Secondly: How do you propose to use standardized data in the evaluation of teachers? With VAM? If you are a proponent of VAM, then you truly do not understand the statistical models that are commonly used.

    I have nothing against standardized exams. They do serve some purposes. But VAM is a sham, plain and simple.

    • E. Rat says:

      And in districts where tenure has been weakened, the teaching staffs tend to get not just younger (and cheaper!), but whiter. If we are really interested in equity in education, I think a key issue to address would be the implications of an increasingly white teaching force working in increasingly segregated schools with students of color.

      • Matt Barnum says:

        E. Rat, I’m aware of the fact that Chicago ‘turnarounds’ have a lower percentage of black teachers. I don’t feel good about that, but there is reason to believe these turnarounds are actually improving student outcomes:

        I don’t mean to be glib, but I am more concerned about student achievement than the skin color of the teacher in front of those students..

        Plus there’s evidence that charters are actually decreasing the racial stratification in Chicago:

      • E. Rat says:

        If the report in the Trib is the best evidence you can marshal to claim charters are improving student outcomes…well, it’s not a very compelling case. As noted in the article, all but one of the schools run by the chain it discusses are still on probation, their dropout statistics are problematic, and it appears ALL/most Chicago public elementary schools were performing better over the time of the study.

        Moreover, since these are “turnaround” model schools, they were eligible for some additional funding. In California, the SIG pot tripled some school budgets for three years. That kind of additional resource allocation may explain more about the test score increases (such as they are) than the quality of the teaching.

        You may not mean to be glib, but you’re supporting policies that provide students less-trained teachers with whom they have nothing in common. And you’re supporting these policies in the name of – at best – limited improvement on standardized tests in two subject areas.

      • Matt Barnum says:

        I do want to look at some more research on turnarounds. But I do think the study is pretty good evidence. No, the turnarounds don’t perform miracles, but they are showing real improvements.

        “On state tests, underperforming elementary schools improved reading skills enough to cut in half the gap that once existed between their performance and district standards and did well enough in math to cut the gap by two-thirds, according to a report by the University of Chicago’s Consortium on Chicago School Research.”

    • Matt Barnum says:

      Steve, I’m not sure what evidence there is that charters are more expensive to operate. They receive less money per pupil from the state – about three quarters of that of a traditional public school student (

      And lest you think that this is made for by private donations, a study was just released finding that in the four major cities studied, even accounting for donations, charters still received massively less funding (

      Further there’s (mixed) evidence that Chicago charters are doing a pretty good job:

      I do support using VAM as a fraction of teacher evaluations, but feel free to explain why it’s a ‘sham.’

      • meghank says:

        For one thing, half of teachers don’t teach tested subjects, so that their VAM is based on the scores of students they may have never taught. That is just one way in which VAM is a ‘sham.’

      • Matt Barnum says:

        What district(s) is/are this happening in?

      • meghank says:

        Every district in the state of Tennessee.

      • Steve M says:

        Matt, do you actually understand value added models? As in, the nitty-gritty details? Here you go from the two models that are floating around LA right now. Don’t worry, they were both taken from the standard models used across the country…

        Buddin’s 2010 model (produced for the LA Times, but based upon the top model…first produced in 1987…that is floating around) has the following flaws/problems:

        1) He uses an instrumental variable approach in dealing with errors associated with using lagged student data. This was to “reduce some noise in the prior year data…” In other words, his data (and the data used in every VAM that I am aware of) was massaged to begin with.
        2)His student retention coefficient was arbitrarily set at 1. However, typical retention coefficients have been found to range from 0.5 to 1, and amongst researchers there is no agreed-upon value to use.
        3)The scope stinks. His model, like all the others that are currently used, completely ignores school-level effects and assigns all results to teachers. This is a major flaw in ALL VAM studies that has never been addressed. Authors state that such effects can be calculated and incorporated…but none has ever included school-level effects in their VAM, as “they are very expensive to evaluate.” Buddin’s study used estimated school-level effects to be 0.06 and math to be 0.08, but later admits that other studies’ school-level effects in elementary schools have been up to 0.33 standard deviations. That quote is a ridiculous way of authors admitting that their models are not robust.
        4)He entirely ignored class size effects in his study, discounting it as “unrelated to teacher effectiveness.” Explain that to teachers…
        5)Buddin’s study found teacher-level effects of 0.1902 in ELA and 0.2772 in math, but their coefficients of determination are 0.6863 in ELA and 0.5966 in math. This, for a flawed study based on massaged data.

        Ok, Matt, how about the LAUSD’s recent attempt to incorporate VAM? Their AGT model is taken entirely from the University of Wisconsin’s model-the gold standard of VAM:

        1)It uses data from unrelated prior-year classes in a teacher’s score (e.g. chemistry AGT based upon biology scores from the previous year).
        2)It is an inherently-flawed covariance model that assumes that the covariance between ELA and math is zero…which is absurd, as the district’s own (independent) studies have found student-level covariance coefficients of -0.011 and classroom-level coefficients of 0.077 for 4th grade MATH effects on ELA scores. Read that again…MATH-on-ELA! What are the other grades’ coefficients? What are other subject-to-subject coefficients? Gee, they won’t publish that…I wonder why? Imagine what the coefficients the other way around would look like. ELA-on-biology, anyone? Is biology a language-intense subject? Nah, couldn’t be…
        3)It uses six student-level demographic variables. Why six? Because that’s all they have. Why not use religion? Drug use? Absenteeism? Parent education level? Cell phone possession? Parental involvement level? And, how about the biggest one…peer/friend achievement? Nah…too difficult to incorporate, and much easier to assign all effects to teachers. Can you think of any others, Matt? Oh yeah, Matt, national studies have found that only 20% of student achievement is actually tied to teachers…
        3)Unlike Buddin’s study, it doesn’t ignore school-level effects…it only assigns them as the third-level regression (oh no, not important at all…it’s all the teachers fault). Only, it kinda sneaks this discussion in at the end, quiet like. You think that there is a political motive there, Matt? Could it be related to what you are doing? What Eli Broad is doing?
        4)It uses a “shrinkage approach” by proactively bringing outlier data back into the normal population…a completely subjective corrective methodology.
        5)It doesn’t actually use the student-level first regressions in the second level calculations. It uses estimated retention coefficients. So, in other words, take your first level calculated values and then massage them as you see fit. Then, base your second-level calculations on those! Wow, that will surely pass the stink test.
        The retention coefficient is a sticky one, Matt. If a district uses the low end value (50%), then a teacher doesn’t really contribute much to the overall process. If a perfect retention coefficient is used (100%), then the model is wide open to criticism since it attributes nearly everything to teachers and diminishes ALL other variables. Think about that for a while…isn’t that what TFA has been doing all this time? POVERTY IS NOT DESTINY!
        6)Finally, let’s look at a sample of the classroom-level coefficients that were used in calculations:
        -effect of being female -0.013 +/- 0.042
        -effect of being white: 0.051 +/- 0.05
        -effect of being African American: 0.008 +/- 0.039
        -effect of being asian: 0.099 +/- 0.05
        How accurate is a model that uses input that can have uncertainty equal to up to five times the actual datum?

        It’s a sham, Matt

      • Manuel says:

        Steve, you left out the best part: the basis of the AGT is the CSTs and they all define the proficient cutoff point as the average of their Bell Curve distribution. Thus, if a student manages to move over the average, some other poor soul must take her/his place below the average. Growth isn’t even possible even if VAM was not a sham.

        BTW, the best proof I’ve seen that the CSTs are a statistical exercise came from LAUSD’s own stats people. They plotted the achievement bands as functions of the classroom mark for 5th through 11th for the 2008-09 data. The distributions were identical for all marks in the 5th grade both for ELA and math. In other words, the percent of kids in each band was the same and coincided with the area under the Bell Curve. For all other grades, the distributions were no longer identical as students discover the joys of bubbling, but the distributions for the kids getting A’s were the same as for the 5th graders. I guess no parents complain because the CSTs don’t count toward classroom marks. But this may change depending on how Common Core is implemented and/or changes in the Educational Code. Meanwhile, schools must be reformed because “50% of students are below grade level.”

        It is a sham.

      • Steve M says:

        Should I continue and give you an idea of what this looks like in terms of raw scores? As in, evaluating tens of thousands of raw scores?

      • Steve M says:

        Ah, what the hell, I’ll tell you anyway. This will give some perspective (and I don’t know when I’ll get to respond again, as work is stressful this time of year)…

        These are the results of a series of statistical calculations that I performed on the average student ELA scores of LAUSD’s elementary schools for the year 2004 (over 800 schools…several thousand 3rd grade teachers…tens of thousands of kids). [That statement I just made says something important about my methods…did anyone notice?] So, I must qualify my results by saying that they are very close to the actual student averages, but are slightly off since my calculations are limited by the unavailability of individual teacher and student data.
        The LA Times rates the best and worst elementary school teachers’ effectiveness in teaching English according to the following schedule: a “most effective” English teacher is one whose students repeatedly increase their average raw ELA scores by 7% or more relative to the students’ LAUSD peers; a “least effective” teacher of English is one whose students’ average raw ELA scores repeatedly decrease by 7% or more compared to the students’ LAUSD peers. We must note that these definitions reflect changes in students’ scores relative to the students’ peers, and not as simple gains and loses from one year to another. This is key, as the use of relative percentile gains and loses make it appear to the general public that gross differences in teacher effects are occurring, when the reality is that only small differences are happening in most cases.

        This is not a statistical “trick”, rather it is a means to dupe people who really do not care about the admittedly boring topic of statistics.
        Since large differences in teacher effects are not occurring in most cases, this classification scheme can rightfully be deemed as ridiculous. Observe what this method produces:

        In 2004, the average LAUSD third grader’s raw ELA score was approximately 37.8, with a standard deviation of about 6.5. This means that the typical student correctly answered 37.8 out of 65 questions, and that more than two thirds of the district’s third graders received raw scores between 31.3 and 44.3. Teachers at an underperforming school, with students who averaged 35.1 out of 65 on their third grade ELA exam, would be classified as “most effective” if their students’ average scores went up to 36.3 (that is, if the teachers’ students managed to correctly answer 1.2 more questions than the average school peer). An unfortunate colleague next door would be labeled as “least effective” if her students answered 33.8 out of 65 questions correctly (1.3 questions fewer than the school’s average third grader).
        In a high performing elementary school, where third grade students averaged 43.3 out of 65 on their ELA exams, a teacher would be labeled as “most effective” if her students scored 45.1 (answering nearly two questions more than their average peer). On the other hand, a teacher down the hall would be deemed “least effective” if his students answered an average of 41.8 questions correctly (1.5 questions fewer than his school’s average third grader).
        Why are these “most effective” and “least effective” labels absurd? They are absurd because teachers place varying degrees of importance on the annual California Standards Test. Some educators compensate for their mediocre or poor instruction throughout the year by dedicating the entire month of April to CST review and drill. On the other hand, many teachers feel comfortable with what they have imparted to their students throughout the year and would rather present new lessons than spend precious hours reviewing for what they consider a meaningless test.
        The result is that some average or exceptional teachers in the LAUSD will have students who correctly answer one or two questions fewer than their school peers, and some of the worst teachers will produce above average, or even stellar CST scores. I have observed both types in my nineteen-year career as a teacher. Indeed, I can say that some of the worst teachers I have known have been those that teach only those concepts covered by the CST, producing students who have a hard time doing anything other than respond to a number of similarly phrased, trivial questions.

        We’re talking about average differences of one or two questions on a 65 question test, Matt…with completely absurd labels (“least effective teacher”, “most effective teacher”, “effective teacher”) being bandied around.

        Personally, I would like to see those teachers who are beyond retirement age, and who are not doing their jobs effectively, eased out and into retirement. Others, who are younger and not being very effective, should be mentored and helped. A few should be let go (as in ANY profession).

        But this is all a drop in the bucket. Our country has social problems that need to be addressed, and naive people like Matt are not helping. In fact, they are getting in the way.

      • Manuel says:

        Wow, I am impressed. I wish there was a way we could get together and compare notes. You are in LA and so am I. Maybe Gary can forward my email address to you.

        So, yeah, I agree, people like Matt get in the way by their insistence on relaying on single studies to prove their point and never getting down into what the actual data means.

      • gkm001 says:

        The HuffPost article is interesting, but it doesn’t address the question of how the per-pupil amount is set and paid out.

        From April 11, 2013, Jim Broadway’s Illinois School Policy Updates: “Just last month, the education committee had to address another problem with current charter school law. When a school board denies a charter school proposal and the state steps in to overturn that decision, the local school district ends up funding that charter school with its own dollars.

        “That’s not easy when the state is pro-rating General State Aid to schools at just 89% of what would be required to fund the Foundation Level that remains, as Gov. Pat Quinn so often says, ‘the law of the land, the Land of Lincoln.’

        “Consider the plight of Rich Township District 227, where funding for a state-approved charter school comes out of the GSA paid to the school district. ‘Our state aid dollars are withheld to the full per capita tuition rate,’ said Superintendent Donna Leak.

        “That is, the GSA funds the district receives are prorated at 89% (as in other school district in Illinois) but District 227 has to fund its charter school at the full tuition rate. ‘Forty-five percent of my [GSA] money goes to 475 students at a charter school, when I have four thousand students in my school,’ she said.”

      • mpledger says:

        I recommend School Finances 101

        And in particular a quote from a study on that site…
        “But simple direct comparisons between subsidies for charter schools and public districts can be misleading because public districts may still retain some responsibility for expenditures associated with charters that fall within their district boundaries or that serve students from their district. For example, under many state charter laws, host districts or sending districts retain responsibility for providing transportation services, subsidizing food services, or providing funding for special education services”

  14. Robert Berretta says:

    The teacher evaluation argument has been spinning in circles for years now. First, there are shitty performers that deserve to be dismissed in every profession: doctors, lawyers, waiters, police officers, accountants, secretaries, etc. It’s the classic bell curve, and lopping off the bottom 5, 10 or 15% doesn’t improve the overall performance because it assumes we have cadres of average or above-average replacements waiting in the wings. We don’t–we just have TFA recruits.

    Second, we all know who is good, who is average, and who sucks. We don’t need a fancy metric to tell us that. As my colleague said, “Great, you weighed yourself. You’re obese.”

    Third, you want better teacher evaluations? Focus heavily on the process of teaching, and just a bit on the outcomes. Focusing on outcomes leads folks to do all kinds of weird and unethical shit to reach them. Focusing more on the process will help everyone improve, since good process generally equals good outcomes.

  15. E. Rat says:

    The study was funded and publicized by the Walton Family Foundation – a pro-charter, pro-voucher, anti-union education reform organization.

    The Foundation has a known bias; it is not an ad-hominem attack to consider its studies problematic. The following link provides more information around similiar funding claims:

    I am also curious as to how the study accounted for physical plant issues, both for charter schools using public school facilities and those engaged in various lease schemes with private concerns. I’ve seen studies where charters bemoan their limited budgets by listing physical plant costs being borne by the school district that owns the building.

    • Steve M says:

      The same thing goes on in the LAUSD. I would estimate that the two types of charters (those housed in former district facilities versus those in private facilities) are fairly evenly split.

      However, charters in LAUSD have historically not been charged for athletics, police presence and other ancillary costs. Also, they typically do not adhere to federal ELL and SpecEd requirements and would show abnormally low expenditures related to those populations.

      Lastly, if we did an analysis of their faculties, we would see that their teachers’ ages average to about 32…much different from the district as a whole, which is probably 5-10 years older, on average. Good luck to the charter movement in ten years, when massive numbers of recent hires start having families.

  16. Louisiana Teacher says:


    I think it’s refreshing that in our current, vitriolic political environment there are still people who want to have a meaningful, civil discussion with others who hold opposing viewpoints, so I appreciate your sincerity and candor.

    In previous posts, several commenters noted that teachers are not opposed to testing students, nor are tests “bad.” The general consensus, however, is that using students’ standardized test scores as the predominant component in evaluating a teacher’s effectiveness in the classroom IS bad. And there are several reasons why, many of which are enumerated in this Washington Post article written by Jack Jennings, founder and former president of the Center on Education Policy.

    Additionally, you have stated that the standardized tests can’t be that flawed, but, as I pointed out further up in these posts, Pearson has been singled out as having a particularly disastrous track record in testing errors. See the following:

    I became a teacher 9 years ago through an alternative certification program, The New Teacher Project. (Our cohorts partnered with TFA.) I have taught inner-city fourth graders since I first stepped into a classroom. At my current school, 85 percent of our students are on free or reduced lunch. While I love them with all my heart, this will likely be my last year teaching.

    Louisiana’s legislature recently enacted VAM. Beginning this year, student test scores comprise 50 percent of a teacher’s evaluation. Simultaneously, the state has adopted the Common Core curriculum, so this year also marks the first of three years of the new, transitional curriculum and revamped high-stakes tests.

    Three-quarters of my students live in home environments that are so discordant or lacking in nurturing that it directly affects their daily academic performance—either because they have undiagnosed learning disabilities/developmental disorders or because they are not getting treatment for physical conditions or essentials, like food and sleep.

    Over half of my students were reading significantly below grade level at the beginning of the year. Now, about 20 percent of them are. One of those students failed the dyslexia screening, but our guidance counselor said his IQ screen was less than 90, so he could not be labeled as “dyslexic.” This child’s mother was a special education student when she was a child and cannot help her son with homework. Furthermore, she has six other children at our school with the same problems. She is resentful about her years spent in special education and denies that her children face any challenges. Consequently, most of them are failing.

    Another of my low students is actually the highest achieving child in her family, despite high absenteeism. Our principal called Child Welfare Services and reported this child’s mother because she was not waking up her children for school because they were “tired in the morning.” The counselor from CWS later said that they could not prosecute this woman because “she has the mind of a child.”

    I could tell you more stories about my students, but I won’t belabor the point. At least 15 to 20 percent of my class will fail the high-stakes test this year—and show negative growth from last year —despite the fact that I am a highly-effective teacher (according to classroom observations by my principal), a National Board Certified Teacher, and my students have made enormous academic strides according to pre- and post-unit tests.

    I am living proof that VAM does not work. Even worse, there are dozens of teachers in our district just like me—effective, disillusioned and out the door.

  17. KrazyTA says:

    No disrespect, but sometimes the commentary on this topic had had an unreal quality.

    Here is an exercise in evaluating whether information is being used fairly: go to the link below [literally provided above by Mr. Matt Barnum in his second letter to Gary although I first used Google] that references initial comments by Dr. Bruce Baker re the Chetty study:

    Referencing Dr. Baker’s commentary and that by Matthew Di Carlo, Mr. Barnum claims that “No, the Chetty study has not been ‘debunked’” and that “neither suggests that the study is patently invalid.”

    This seems to take all the bite out of two critical looks at the study, yes?

    Consider this comment by Dr. Baker under the section “MY 2 BIG POINTS”:

    “First and perhaps most importantly, just because teacher VA scores in a massive data set show variance does not mean that we can identify with any level of precision or accuracy, which individual teachers (plucking single points from a massive scatterplot) are “good” and which are “bad.” Therein exists one of the major fallacies of moving from large scale econometric analysis to micro level human resource management.”

    His second “BIG POINT” has to do with “the implications of this study for immediate personnel actions” in which one of the authors of the study [Professor Friedman] says “The message is to fire people sooner rather than later” [NYTimes]. After this Dr. Baker adds [next paragraph his words]:

    Professor Chetty acknowledged: “Of course there are going to be mistakes—teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, nor more. (NY Times)

    Dr. Baker ends this short section with the following observations:

    “These two quotes by the authors of the study were unnecessary and inappropriate. Perhaps it’s just how NYT spun it…or simply what they reporter latched on to. I’ve been there. But these quotes in my view undermine a study that has a lot of interesting stuff and cool data embedded within.

    These quotes are unfortunately illustrative of the most egregiously simpleminded, technocratic, dehumanizing and disturbing thinking about how to ‘fix’ teacher quality.”

    Let’s keep this simple: is Dr. Baker saying “yay” or “nay” to the Chetty/Friedman/Rockoff report?

    Make up your own mind.

  18. Dan McGuire says:

    Diane Ravitch’s comments don’t fit the definition of ad hominem. The studies that supposedly prove something aren’t really valid studies. This thread is fraying for lack of cogency.

  19. Matt Barnum says:

    Unfortunately, I can’t keep with the many fantastic comments. What I will say is that I realize VAM has a lot of problems, some of which I believe can and will be smoothed over as implementation of evaluation systems is improved.

    I do believe that VAM should supplement evaluations from a principal – so if a teacher is doing poorly by evaluation, a good VAM score might indicate it’s worth giving that teacher another shot; but a poor evaluation can be confirmed by a poor VAM. I certainly don’t believe that a teacher with a good evaluation should be dismissed due to one bad year of scores; on the other hand, if a well-evaluated teacher has several consecutive years of poor VAM scores, then I think there’s reason to think the tests might be picking up something that the evaluations aren’t.

    Thanks again to everyone who has commented. Apologies that I can’t respond to all of them, but I am reading and thinking about them.

    • Steve M says:

      …and that’s all he wrote.

      The only thing (regarding VAM) that Matt has said with any credibility is his suggestion that normative assessments be administered quarterly. This would reduce much of the noise that clutters VAM.

      The rest, including the articles he cites, is just a restatement of the various talking points that slick willys have been spewing for several years now. These guys are wolves circling a kill.

    • gkm001 says:

      Can we smooth over the problems first, before we use VAM to judge someone’s life’s work?

  20. Louisiana Teacher says:

    This is interesting take on what’s happening in our public schools:

    When the standardized test scores of poor children are removed from the sample, middle- and upper-class children are actually faring better than ever.

    Unfortunately, the ranks of those in the lowest income bracket have swelled over the past 30 years, and their children trail behind–from the age of one on–in every academic benchmark. (See “The Power of Talking to Your Baby” )

  21. mpledger says:

    Since the Chetty study holds such sway, I’d really like to talk about it.

    The key graph for me in the Chetty study is figure 6. The graph displays value added on the x-axis and predicted earning at age 29 on the y-axis. A line is fitted through the data with an unpward slope (indicating better VAM is associated with better predicted earnings). However, if you look at the “raw” data and ignore the line, it quite clear that the best earnings are received by those who score mid-way on VAM, those at the low end of VAM predict the lowest earnings BUT THOSE AT THE HIGH END DO WORSE THAN THOSE WHO SCORE MIDWAY AT VAM.

    Their entire conclusion seems to be premised on fitting a line to the data when in fact it is obviously non-linear. (My interpretation would be that very low VAM doesn’t help with predicted earnings but VAMs above the 30th percentile have equal effect on predicted earnings.)

    Even worse, this isn’t even the raw data, it’s binned raw data, which accentuates the pattern at the expence of the variation which means the estimates for the fit of the line are way over-estimated.

    I don’t know if it’s a standard too to bin data in economics like this but to my mind this is not a good way to represent the truth in the data.

    So in summary, the Chetty study is poorly done for inapropriate use of a linear model in a non-linear situation and for binning data that inappropriately inflates the importance of the model fit … but it has lots of nice, important looking, greek symbols.

  22. meghank says:

    Sorry, Gary – I think your commenters were a little too brutal with Matt Barnum, and we may have run him off. I’d be surprised if he writes you another open letter.

    He exposed a few glaring contradictions in his thinking in these comments:
    “This means that test-based accountability will promote teachers who everyone agrees are high quality and will dismiss teachers who teach to the test too much…
    on April 13, 2013 at 2:21 pm · Reply”
    “Though the truth is, I am not too concerned about ‘teaching to the test’ – I think to a large extent teaching to the test is a good thing, if the test is adequately designed.
    on April 24, 2013 at 2:31 pm · Reply”

    That was just the most obvious.

    To be honest, after what he said about Diane Ravitch’s having lost a great deal of respectability, I’m not sorry to have run him off.

    • Matt Barnum says:

      Nope. Haven’t run me out. In the midst of working on my response to Gary.

      What I mean by teaching to the test is that there is a good kind of teaching to the test (meaning teaching the content that is and should be tested) and bad kinds of teaching to the test (meaning excluding some content that should be taught but won’t be tested*, and excessive test prep). I believe that the first kind of teaching to the test predominates and is a good thing; I believe the second kind of teaching to the test can be ameliorated or eliminated through well-designed tests and accountability systems.

      *A well-designed test should assess most everything that should be taught (at least the broad concepts).

      • skepticnotcynic says:

        Just give up already, you’ve already dug yourself into a deep hole and have shown how naive you really are. Your arguments have been blown out of the water, yet you still continue to post on a topic you seem to know very little about. Had you spent just a few more years in the classroom, you wouldn’t even be having this debate.

      • Manuel says:

        FYI, test prep is illegal in California. From California’s Educational Code:

        “60611. (a) A city, county, city and county, district superintendent of schools, or principal or teacher of any elementary or secondary school, including a charter school, shall not carry on any program of specific preparation of pupils for the statewide pupil assessment program or a particular test used therein.”

        That’s the theory. The practice is another matter.

      • Educator says:

        How does one test for things like: civic engagement, honesty, responsibility, cooperation, comunication, etc…These things are expected to be learned in public schools.

  23. Pingback: My discussion with Matt Barnum Part 1 | Gary Rubinstein's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s