The inaccuracy of the New York City teacher evaluation data is taking a beating. As I expected, this data would not stand up to the scrutiny of the public or even the media. Value-Added is proving to be the Cathie Black of mathematical formulas.
A teacher’s Value-Added score is a number between about -1 and 1. That score represents the amount of ‘standard deviations’ a teacher’s class has improved from the previous year’s state test to the current year’s state test. One standard deviation is around 20 percentile points. After the teacher’s score is calculated, the -1 to 1 is converted to a percentile rank between 0 and 100. These are the scores you see in the papers where a teacher is shamed for getting a score in the single digits.
Though I was opposed to the release of this data because of how poorly it measures teacher quality, I was hopeful that when I got my hands on all this data, I would find it useful. Well, I got much more than I bargained for!
In this post I will explain how I used the data contained in the reports to definitively prove: 1) That high-performing charter schools have ‘better’ incoming students than public schools, 2) That these same high-performing charter schools do not ‘move’ their students any better than their public counterparts, and 3) That all teachers add around the same amount of ‘value,’ but the small differences get inflated when converted to percentiles.
In New York City, the value-added score is actually not based on comparing the scores of a group of students from one year to the next, but on comparing the ‘predicted’ scores of a group of students to what those students actually get. The formula to generate this prediction is quite complicated, but the main piece of data it uses is the actual scores that the group of students got in the previous year. This is called, in the data, the pretest.
A week after the public school database was released, a similar database for charter schools was also released. Looking over the data, I realized that I could use it to check to see if charter schools were lying when they said they took students who were way behind grade level and caught them up. Take a network like KIPP. In New York City there are four KIPP middle schools. They have very good fifth grade results and their results get better as they go through the different grades. Some of that improvement comes from attrition, though it is tough sometimes to prove this. The statistic that I’ve been chasing ever since I started investigating these things is ‘What were the 4th grade scores for the incoming KIPP 5th graders?’ I asked a lot of people, including some high ranking KIPP people, and nobody was willing to give me the answer. Well, guess what? The information is right there in the TDR database. All I had to do was look at the ‘pretest’ score for all the fifth grade charter schools. I then made a scatter plot for all fifth grade teachers in the city. The horizontal axis is the score that group of students got at the end of 4th grade and the vertical axis is the score that group of students got at the end of 5th grade. Public schools are blue, non-KIPP charters are red, and KIPP charters are yellow. Notice how in the ELA graph, nearly all the charters are below the trend line, indicating, they are not adding as much ‘value’ as public schools with students with similar 4th grade scores.
As anyone can see, the fact that all the red and yellow markers are clustered pretty close to the average mark (0 is the 50th percentile) means that charters do not serve the high needs low performing students that they claim to. Also notice that since these red and yellow markers are not floating above the cluster of points but right in the middle of all the other points, this means that they do not ‘move’ their students any more than the public schools do. And the public schools manage this without being able to boot kids into the charter schools.
One other very significant thing I’d like to point out is that while I showed there was very little correlation between a teacher’s value-added gains from one year to the next, the high correlation in this plot reveals that the primary factor in predicting the scores for a group of students in one year score is the scores of those same students in the previous year score. If there was a wide variation between teachers’ ability to ‘add value’ this plot would look much more random. This graph proves that when it comes to adding ‘value,’ teachers are generally the same. This does not mean that I think there are not great teachers and that there are not lousy teachers. This just means that the value-added calculations are not able to discern the difference.