Perhaps the most controversial issue in Ed Reform is whether or not it is fair to tie teacher evaluation to their ‘performance’ as defined by reformers as how their students do on standardized exams. Since even reformers acknowledge that teachers aren’t able to take students from a low starting score to any absolute target of high performance, they have devised something that is intended to be fair. It is known as ‘value-added.’
The idea, which has been around for about 30 years, is that there could be a way to compare how a teacher’s students do on some test with how those same students would have done in a parallel universe where they had an ‘average’ teacher instead. If it is possible to make such a measurement, it would determine that teacher’s individual contribution to his student’s ‘learning.’
To someone who is not a teacher, this sounds reasonable enough. When you’ve spent time in schools, though, you know some of the basic problems with standardized tests. For one, if you’ve ever watched a class of students taking a standardized math test, they don’t seem to take it very seriously. The multiple choice aspect leads to students not doing much scrap work as one answer seems to ‘jump out’ to them. Also, since this test doesn’t ‘count’ for a grade, students might not do their best. But since students for other teachers are doing the same thing, it shouldn’t matter since every teacher is operating with the same handicap. Another issue, especially for math, is that the test is like a comprehensive final exam. Maybe you did a good job teaching and your students did well on the individual unit tests throughout the year, but they simply ‘forgot’ the earlier material, or got overwhelmed by having to recall all these different topics.
This ‘Value-Added’ metric is the number one issue for the Ed Reformers. They believe that teachers are lazy and will be forced to work harder when they know they will be judged on how their students do on these tests. Race To The Top applicants had to change their laws so teacher evaluations would be tied, in part, to ‘performance’ in this way. Now, for No Child Left Behind waivers states will also have to get this worked into their laws. Already, many states have incorporated this. Washington D.C. has it count for 50% of some teacher’s evaluations. Colorado is working on a way to get it to be about half. New York has passed a law to make it 40%.
The biggest problem with ‘Value-Added’, not everyone knows, is that this type of metric, even after 20 years of development, is extremely inaccurate. Mathematica Policy Institute, who does the Value Added for Washington D.C. published a report called “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains” for The Department Of Education which on page 31 estimated the error rate to be over 33%, meaning that over a third of the time it will give an effective teacher an ineffective rating and vice versa.
This is why on page 35 they advise against using this type of calculation in as strong a language they can considering they make a lot of money doing these calculations for D.C. schools.
Now many teachers are concerned that when value-added becomes a significant part of teacher evaluations, it will cause many teachers to be unfairly fired because of this high error rate. That actually is not my biggest concern with value-added. Since even Michelle Rhee admits that these scores shouldn’t be the ‘sole’ determination of teacher evaluation, it seems that 50% is about the most anyone wants to use them for. As I’ll demonstrate later in this post, the variation among these value-added scores are so random and small that they probably won’t cause anyone to get fired, and may even save some ineffective teachers who happened to, by no action of their own, add a lot of value.
No, the real danger of value-added is that it is currently being used as a way to judge and shut down schools. In New York City, 85% of the school’s report cards are based on these value added calculations. When a school gets an ‘F’ based on this, the shut-down machine gets fired up, as has recently happened to 47 schools in New York City. That such inaccurate measures are being misused for such drastic decisions is sad.
I like, from time to time, to read a research paper that is cited by the corporate reformers. One of the gurus of Value Added is Tom Kane from Harvard. He was Michelle Rhee’s advisor there. Well, he co-wrote an influential paper in 2006 called ‘Identifying Effective Teachers Using Performance On The Job‘ in which he argues that Value Added is accurate enough to prove that alternatively certified teachers are as effective as traditionally certified teachers. He cites on page 19 a different report which, he says, proved that the Value Added evaluations correlate with the standard Principal evaluations. This intrigued me since I had always wondered if anyone had checked that. If they correlate a lot, then why do we even need them? Why not just use the principal evaluations we already use and save all the money and stress it takes to do it the other way? If they don’t correlate, then we have to wonder how accurate they are? Is the problem really that principals don’t have the ability to accurately assess their teachers? The stats he cites, though, do not sound very convincing.
So I looked up that 2005 report “Principals as Agents: Subjective Performance Measurement in Education” by Jacob and Lefgren. In this paper they claim that there is a significant correlation between the value added statistic and the principal evaluation statistic. Yet, when I looked at the appendix and saw their own scatter plot, I found that there is essentially no correlation between the two.
Notice that they distort the plot by having the principal evaluations go from -3 to +3 standard deviations while the value-added only goes from about -1.5 to +1. Also, see that pretty much everyone is between -.5 and +.5 on the Value Added scale. Had this chart been made ‘to scale’ it would be more clear that everyone gets about the same value added score. There is little difference between the value added for teachers who had poor principal evaluations and for teachers who had good principal evaluations. Notice that the sample with the lowest principal rating actually has a higher value added than the two highest rated people by the principal.
This is why the main danger with value added is the school ratings that use them for 85%, and not the teacher evaluations that use them for 50%. Before, when just principal evaluations were used, teachers were in one of two categories: effective or ineffective. Now, with this random stat factored in, there will be four categories: effective / high value added, effective / low value added, ineffective / high value added, and ineffective / low value added. So they will have to keep everyone except the ineffective / low value added. In essence, they have to keep more teachers than they would with the old system. They won’t be able to fire the ineffective teachers who happen to score high on the value added component.