The inaccuracy of the New York City teacher evaluation data is taking a beating. As I expected, this data would not stand up to the scrutiny of the public or even the media. Value-Added is proving to be the Cathie Black of mathematical formulas.

A teacher’s Value-Added score is a number between about -1 and 1. That score represents the amount of ‘standard deviations’ a teacher’s class has improved from the previous year’s state test to the current year’s state test. One standard deviation is around 20 percentile points. After the teacher’s score is calculated, the -1 to 1 is converted to a percentile rank between 0 and 100. These are the scores you see in the papers where a teacher is shamed for getting a score in the single digits.

Though I was opposed to the release of this data because of how poorly it measures teacher quality, I was hopeful that when I got my hands on all this data, I would find it useful. Well, I got much more than I bargained for!

In this post I will explain how I used the data contained in the reports to definitively prove: 1) That high-performing charter schools have ‘better’ incoming students than public schools, 2) That these same high-performing charter schools do not ‘move’ their students any better than their public counterparts, and 3) That all teachers add around the same amount of ‘value,’ but the small differences get inflated when converted to percentiles.

In New York City, the value-added score is actually not based on comparing the scores of a group of students from one year to the next, but on comparing the ‘predicted’ scores of a group of students to what those students actually get. The formula to generate this prediction is quite complicated, but the main piece of data it uses is the actual scores that the group of students got in the previous year. This is called, in the data, the pretest.

A week after the public school database was released, a similar database for charter schools was also released. Looking over the data, I realized that I could use it to check to see if charter schools were lying when they said they took students who were way behind grade level and caught them up. Take a network like KIPP. In New York City there are four KIPP middle schools. They have very good fifth grade results and their results get better as they go through the different grades. Some of that improvement comes from attrition, though it is tough sometimes to prove this. The statistic that I’ve been chasing ever since I started investigating these things is ‘What were the 4th grade scores for the incoming KIPP 5th graders?’ I asked a lot of people, including some high ranking KIPP people, and nobody was willing to give me the answer. Well, guess what? The information is right there in the TDR database. All I had to do was look at the ‘pretest’ score for all the fifth grade charter schools. I then made a scatter plot for all fifth grade teachers in the city. The horizontal axis is the score that group of students got at the end of 4th grade and the vertical axis is the score that group of students got at the end of 5th grade. Public schools are blue, non-KIPP charters are red, and KIPP charters are yellow. Notice how in the ELA graph, nearly all the charters are below the trend line, indicating, they are not adding as much ‘value’ as public schools with students with similar 4th grade scores.

As anyone can see, the fact that all the red and yellow markers are clustered pretty close to the average mark (0 is the 50th percentile) means that charters do not serve the high needs low performing students that they claim to. Also notice that since these red and yellow markers are not floating above the cluster of points but right in the middle of all the other points, this means that they do not ‘move’ their students any more than the public schools do. And the public schools manage this without being able to boot kids into the charter schools.

One other very significant thing I’d like to point out is that while I showed there was very little correlation between a teacher’s value-added gains from one year to the next, the high correlation in this plot reveals that the primary factor in predicting the scores for a group of students in one year score is the scores of those same students in the previous year score. If there was a wide variation between teachers’ ability to ‘add value’ this plot would look much more random. This graph proves that when it comes to adding ‘value,’ teachers are generally the same. This does not mean that I think there are not great teachers and that there are not lousy teachers. This just means that the value-added calculations are not able to discern the difference.

It seems to me that your analysis also supports the idea that teaching, as an organized activity, has no measurable positive effect on students, and perhaps that this whole “school” thing is not such a hot idea, period…

As a career teacher, I definitely don’t think that school is a bad idea. I do think that these results aren’t measuring much except who the students are.

@sokpuppette that’s false. This data didn’t include students who received no schooling. So you can’t say anything about the relative performance of students with and without schooling.

Actually, since the test at the end of fifth grade (supposedly) tests more advanced material than the test at the end of fourth grade, if teachers had no effect at all, the trend line would go down. What this graph shows is that great students do great, mediocre students continue to do mediocre, and poor students stay poor. But all of them know more than they did the previous year.

I never said the teachers had ‘no effect’ just that there is not much difference in ‘value added’ from teacher to teacher. It is because the VAST majority of teachers are quite competent. And the superstars, though they are making differences in other ways, it is not showing up on their test scores.

Gary,

Are you computing this from the pretest and posttest numbers or from the value-added score?

The value-added score combines the change between prettest and posttest with some controls for demographic characteristics (yes, the equation assumes some kids will learn more than others, even if they started out in different places) and school-level effects. The R-Squared between your percentile in change in test score and your value added percentile is about 0.72– about 72% of your value added percentile is explained by your students’ change in test scores, with the remainder determined by their demography and your school. The school-level effects is the strangest one, since presumably the aggregate school effect, once you control for student pretests and (perhaps) other aspects of demography, is almost entirely the result of the teachers in the building, yet they are not getting credit for that.

In this case, though, it’s possible that the school-level effects are working against KIPP 5th grade teachers. This is because if most of the teachers in a school had strong growth in their student scores, it will tend to make the entry-point teachers look worse, particularly if students come in at a lower point. This appeared to happen with the 6th grader teachers at my former, non-charter school.

In any case, it’s worth thinking about the other controls, apart from the pretest to posttest change, if you are analyzing the value-added data. They reproduced one of the equations in this article: (http://www.nytimes.com/2011/03/07/education/07winerip.html?pagewanted=all)

Anyway, I’ve appreciated your posts. I think the longer you’v taught after your corps years, the more ambivalent you are bound to be both about TFA and about the current moment in education.

Jacob (2000 NYC)

To put it another way, many commentators were observing after the release of the scores that high-performing and low-performing teachers are spread across every school in the city. This may be true, but it is also an artifact of the model, which controls for school effects in such a way that the average value-added for each school will be average, and you can’t have an entire school of below-average or above-average teachers.

Now, you could argue that this is reasonable– teaching in a terrible school is really really hard, and any measure of student growth in the most trying circumstances should be honored. On the other hand, you could argue that schools that are doing a great job are probably doing a great job (in terms of change in student test scores) because of their teachers, not because of some mysterious quality of the air in the school– and that that difference should be reflected in the average value-added for the school.

Great teaching is an area that simply cannot be determined by data. Students make a choice to learn. If you put all the students who don’t want to learn with the best teachers in the world, the teachers may be able to inspire a small percentage of those student to learn; however, if students don’t want to learn…they won’t learn. Our schools are filled with students who have made the conscious choice not to learn. Administrators, superintendents, and politicians make excuses, saying that these students are disadvantaged and have terrible teachers in their schools. That simply is not true. Socioeconomic status does not determine the desire to learn; the desire comes from within a child. In addition, there are a lot of students who come from middle to upper income families who don’t want to learn either. However, if parents do not value education, their children will most likely not value education. Absenteeism is another huge problem. How can a student learn if he is absent 68 days during the school year? In some of our poorest school districts,truancy is not enforced. The best teachers will not be able to solve these problems; there is no teaching strategy that will magically solve these problems. We are throwing money at something that will not change by a faculty of great teachers. When are we finally going to wake up and really see the problem? We must to rebuild our education system from the ground up…not put band-aids on the hurt areas. Teachers are not to blame for students who don’t want to learn. We need to let students make choices about their education after the 7th or 8th Grade. Students need to take ownership of their secondary education. Only then will we see an improvement in our system. Every student is not college material. We still need plumbers, electricians, construction and factory workers. We need to rethink this carefully before it is too late.

Gary, you have once again hit one out of the park! I have to remember to send you a sample formula of the model used in our state. The actual model which is being used is apparently “top secret” and has not been released! At any rate, your work is very useful in getting out the message about the foolish efforts of some “reformers” to destroy public education unless it is in their financial interest!

This is why I’ve always compared VAMs to CDOs in their destructive capability and that no one really understood how damaging they could be…until now. Thanks for this great series.

Pingback: Some Thoughts on the Wysocki Case « GFBrandenburg's Blog

Excellent work again, Gary!!

Kudos!

Value-added data is so flawed, it’s own creators have backed away from it as being used the way it’s poised to be used. The Union’s about to agree to evaluations where it can potentially account for 100% of a teacher’s evaluations. Some fellow teachers and I just started a petition to raise awareness about this: http://www.change.org/petitions/stop-the-public-shaming-and-unjust-firing-of-teachers. There have been over 700 signatures in just one day. It’s heartening to see how many people agree.

*its own creators

Thank you Gary.

There are lies, damned lies, and statistics. I am not saying you are wrong. I am saying that neither you nor the value-added people can tell much from the data presented here. You are making unsupportable assumptions, taking out of context data, and drawing conclusions you want to reach.

A few years ago I had occasion to talk with the researcher (from Rutgers) who actually did the ice core studies in Greenland that are the basis for the giant British climate change models. He told me that despite all his protestations, all kinds of people have taken his data to ‘prove’ one thing or another about climate change. HE cannot support a majority of the conclusions other researchers have drawn from his data. Their models all start with assumptions that, when taken together with the data, prove the conclusion that the researchers want to reach.

His point is NOT that there is not climate change – because objectively, there is. His point is that you can’t prove the cause of the climate change.

The same thing applies your assertion that the data show that Charter schools don’t work or that teacher evaluations are invalid. You don’t know.

Charters taking students with high pre-test scores a fact, not really something I had to infer by interpreting statistics. The data proves it, not my analysis.

As noted above, the data prove nothing. What school were the students attending the prior year? Did they take the same tests? If they were in a public school 4 years ago, and then changed, and so have been in the Charter school for 4 years – then (If Charter Schools really are good) then their scores should have improved and you would expect to see less improvement. Again, I am not saying you are wrong, I am just saying that the data you site do not support the conclusions you draw. i.e. you don’t know.

Am I looking at this wrong? I’m really fascinated by what could be going on in schools in the 2nd quadrant.

Thinking more on it…by using the difference between predicted and raw score for each grade, the score can enter a Baysian trap…AND OR …the prediction part of the 5th grade score (using 4th numbers) has to be carefully tuned, to avoid creating recursive oscillations that get embedded and continue for the remaining grades.