In the previous post I demonstrated how the New York City progress reports are distorted by a inappropriate use of ‘standard deviation’ to unfairly punish schools that score slightly below average on certain metrics when all the scores are very close together. One such calculation gave P.S. 84 a .8 out of a possible 5 points for its 92.8% attendance while a school with 94.5% would get over 2.5 points and a school with 97.6% would get all 5 points. All twenty-six calculations that make up the progress report are first filtered through this distortion.

There are three categories in the progress report. 15 points are for ‘environment’ which includes attendance and teacher and parent surveys. 25 points are for ‘student performance’ which is based on state test scores, and the biggest component is called ‘student progress,’ also based on state test scores, which is worth 60 of the 100 points. Almost all schools that got an overall F also got an F in ‘progress’ so the purpose of this post is to examine how New York City defines this metric and why it is an invalid one.

At a surface level, it seems very fair to have the majority of the progress report grade based on ‘progress.’ The premise is that if a school, regardless of where the students were academically when they began the school year, makes ‘progress’ with those students, they should get a good ‘progress’ grade and if they fail to, they should get a low ‘progress’ grade.

Unfortunately the way New York City defines ‘progress’ is very different from most people’s intuitive definition. When you look at the 2010 progress report for P.S. 84, you learn that they got an ‘F’ in performance that year. This was based on their four state statistics: They had 31.6% proficient in ELA with a median score of 2.63 out of 4.5 (3 is considered ‘proficient’) and 41.1% proficient in math with a median score of 2.85.

When you look at the 2011 progress report, you find that P.S. 84 made improvements in all four of these statistics so their ‘performance’ grade actually went up to a D. They went up by 9.8% in ELA to 41.8%. Their average ELA went up by .16 to 2.79. They went up by 7.8% in math to 48.9% and up by .14 in their average for math to 2.96. So by the most basic definition any reasonable person would have to admit that this is a school that is making progress.

Compare this ‘progress’ with, P.S. 193, one of the 40 schools in P.S. 84’s ‘peer group’ which are schools that it competes with since they have ‘similar’ demographics. P.S. 193 went down in all four of those categories from 2010 to 2011. They dropped by 4.5% from 66.9% to 62.4% in ELA percent proficiency. They dropped by .15 from 3.21 to 3.06 in average ELA score. They dropped by 4.2% in math proficiency from 72.4% to 68.2% and by .18 from 3.52 to 3.34 in average math score.

Yet, when you look at their ‘progress’ score, paradoxically, P.S. 193 with all it’s losses received a C in progress while P.S. 84 with all its gains received an F.

Learning how this could be so took me on a scavenger hunt through the internet reading different documents and several e-mails from a surprisingly helpful DOE employee. What I found is quite disturbing and it’s STILL not the biggest flaw in the progress report calculation. That one you’ll have to wait for as it will be discussed in part III.

**Flaw #2: Unreliable ‘Value Added Model’ for ‘progress’ that unfairly benefits schools with higher baseline scores.**

So the ‘progress’ score is not based, at all, on how much the students in a school have improved from last year to this year. So what are they based on?

Here’s how it works:

For each student in a school the city looks at that student’s score from last year. Then after the new test happens they compare that student’s new score to the new scores for all the students in the city that had that same starting score. The percent of students with the same starting score that this student ‘beat’ or tied is that students ‘growth’ percent. For example, if a student had a 2.7 in ELA last year and then a 2.8 in ELA this year, they figure out what percent of students who had a 2.7 last year got less than or equal to 2.8. Let’s say that they beat 60% of those kids. That student would have a 60% ‘growth.’ This is then done for every student in the school. Those scores are then sorted from high to low and the median (middle) score becomes the progress percent for ELA for the school. That percent is then converted into another percent based on what I described in the last post, how many standard deviations it is from the median. Finally, that percent is converted into a number from 0 to 15. This is then done for math and then for the lowest 1/3 of the students in ELA, and then for the lowest 1/3 of the students in math for four grades of up to 15 for a maximum total of 60.

The question, then, is what should be considered a ‘good’ score for this stat. Well, it seems like if the middle kid got better than around 50% of other kids with the same starting score, that would be a good score. In my mind, anything between 40% and 60% seems reasonable. But the DOE isn’t concerned about what a ‘good’ score is. They just want to compare schools with their ‘peers’ and punish anyone who is one or two standard deviations below the mean, regardless of how ‘good’ their score is.

The idea of judging on progress instead of absolute scores is a reasonable idea. And this definition of ‘progress’ does sound reasonable at first. I don’t think the DOE deliberately created a statistic that would be so invalid. I think they are just ignorant about statistics and they also didn’t take the time to fix their metrics by finding schools that get Fs they don’t deserve and trying to learn what happened.

Now for the paradox of P.S. 193’s B and P.S. 84’s F. P.S. 193 had an ELA ‘growth’ of 64% meaning that even though the median student got lower on the 2011 test than the 2010 test, 64% of the students who had the same starting score had gone down by even more. For ELA bottom 1/3, they were 63%. For P.S. 84, the numbers for ELA were 61% and 68% for bottom 1/3. So P.S. 84 did a little worse at ‘growing’ their students, but still getting over 60% seems pretty good. They also did better at ‘growing’ their bottom 1/3. But with the standard deviation game explained in the last post, their total score for ELA, out of a possible 30, turned out to be 6.29 vs. 7.49 for P.S. 193, even though P.S. 193 lost ground in both categories. The main difference between P.S. 84 and P.S. 193 is that P.S. 193 had a much higher average score on the tests than P.S. 84 so they were able to make ‘progress’ by going down less than their competition.

Click on the images to open them in full size in a new window.

But the place where P.S. 84 really got creamed was with their math. Though they increased their percent of student proficiency in math, their middle student only ‘beat’ 45% of the students who had the same starting score as him. The middle student of their bottom 1/3 only ‘beat’ 55% of the students who had the same starting score as him. Both of these numbers seem reasonable enough, considering what the metric means, anything around 50% should be considered OK. But with the standard deviations, that 45% earned P.S. 84 a .04 out of 15. So even though they increased their math proficiency percentage and their math average, since their middle student only ‘beat’ 45% of the competition, that earned them almost nothing. Putting all four scores together, P.S. 84 only managed to get 8.2 points out of 60.

Now P.S. 193 managed to get a C because their middle student’s losses weren’t as bad as the losses of the competition. What seems unfair is that a school with a higher starting score, like P.S. 193, seems to get a break on their losses while P.S. 84 gets destroyed despite their gains.

Though this growth is, in theory, totally independent of the starting value, I did a little experiment. I made a scatter plot of starting score on ELA vs. ELA growth. If this ‘growth’ was really completely independent of the starting score, I would expect to see a big blob of dots with no visible correlation. What I got instead was this:

So there seems to be some correlation where the higher starting scores are able to get more ‘growth’ over their competition. Though this stat was designed so that starting score had nothing to do with the amount of ‘growth,’ based on this scatter plot, it seems that higher starting scores have an unfair advantage when it comes to ‘growth.’ When I asked the guy in the DOE about how so many high scoring schools managed to also get 70 or more percent ‘growth,’ he said it is affected somewhat by the fact that there are a lot more ‘ties’ with higher starting scores (since there are fewer grades for it to grow to) and the growth percent is what percent you did better than or equal to.

To me, this ‘progress’ grade is not a reliable enough statistic to count for 60% of a school’s grade. There is the underlying assumption that every kid who got a 2.7 on ELA is exactly the same. But what if one schools 2.7 is a kid who had just increased his score from 2.0 the year before while another 2.7 from another school had only raised up from a 2.5 the year before? Can they be expected to get the same gains the next year? And what if the teacher the year before changed the answers of my 2.7 kid so he really didn’t get a 2.7? Or what if the other schools I’m competing against change some answers? There are so many things wrong with this way of doing the growth stat.

And for a score of 45% to be scaled down to .04 points out of 15 is crazy. 45% is a very reasonable number of kids with the same starting score to ‘beat.’ Especially if everyone is progressing, schools shouldn’t be penalized for not progressing as fast as others.

So that explains, I hope I did an OK job, how the ‘student progress’ score is calculated and some issues of why it is not a very valid statistic, particularly after running it through the standard deviation distorter.

In the next post I will tackle what could be the biggest flaw of all in this progress report calculation. I’m going for the knockout punch. Stay tuned.

Just eyeballing it, it seems like if a school had low scores last year, they’ve got an opportunity for big growth, and in particular, higher scores seem to limit the floor on growth. The middling schools have the most opportunity to get poor growth scores.

Pingback: Remainders: City teachers are superheroes in new video game | GothamSchools

Gary, I’ve been glued to these posts since #1 and am currently in the middle of #2. I advocate for many “F” High Schools, I’m assuming the grading metric is the same for H.S.? Like Columbus, Gompers, Lehman, etc.

At this moment I understand better/more, but admit to not being all the way there. You’re helping me understand the un-understandable. Thank You So Very Much…

Thank you so much for writing these!

There is also a MAJOR difference between a kid who got a 2.7 and that was the very best they could do and a kid who got a 2.7 and that was the very worst they could do. Having witnessed at this point 10 years of state testing, I’ve seen the gamut — kids who walk in and for whatever reason do better on that test than they EVER did in class…and also kids who walk in to take the test having just had a car accident…a whole class where a mouse ran across our room right before I was about to say “Begin” and I had to coax the kids off the desks first…and so on. I realize that probably it’s not a HUGE percentage of kids who are having the best or worst days of their lives on test day, but it’s some, and with the detail level of these calculations, it seems that even the small things have crazily huge impacts.

And congratulations…this may be the only time I’ve voluntarily read not one but THREE posts that contain multiple statistical calculations and a scatter plot 🙂