I was inspired to get to the bottom of the New York City school progress report grades after reading this story from the New York Times Schoolbook website about P.S. 84 which was one of the thirty F rated schools this year despite seeming to be a very good school.
To understand and analyze the accuracy of the 26 calculations that go into the final score, from 0 to 100, which then gets translated into a letter grade of A, B, C, D, or F, requires a math major, which, fortunately, I was.
I’ve learned, and will attempt to fully explain in this and future posts, three major flaws in the system that make the progress score completely invalid. Two things I hope to accomplish with this are 1) To make the staff, students, and parents of students at P.S. 84 feel better and have some clear explanations so they know how this happened and how they might (or might not) be able to stop it from happening again, and 2) To let the media know about this invalid metric which has been, and continues to be, used to shut down ‘failing’ schools to make room for charter schools.
As I’ve made a name for myself in debunking ‘miracle’ schools by showing they are not as great as they claim to be, this is an unusual role for me in debunking a ‘failing’ school. (Is this ‘bunking’ or ‘rebunking’?) Still, I use the same tools, in this case, the actual report cards and also the database which is available at the DOE website.
Flaw #1: Assuming that ‘two standard deviations below the mean’ is a lot worse than ‘average’
The school report card is based on thirteen categories which have a maximum total of 100 points. Then, the bottom 3% of schools, regardless of how high their final scores are, are assigned Fs. This 3% is determined before the calculations are done. There will be about 30 Fs. For 2010-2011, the bottom 3% all got under 18 points out of 100. These numbers are so low, that no school could argue that they were cheated. I mean, an 18 out of 100? They should be ashamed of themselves, right?
|Name||Total Points Possible|
|ELA Progress bottom 1/3||15|
|Math Progress bottom 1/3||15|
|ELA Percent Proficient||6.25|
|ELA Average Score||6.25|
|Math Percent Proficient||6.25|
|Math Average Score||6.25|
|Safety and Respect||2.5|
Five of the 100 points are based on attendance. When I looked at the progress report for P.S. 84, I saw that they had 92.8% Not bad. But when that score got converted to a number between 0 and 5, I was shocked. 92.8% attendance translated to 18.4% of the total, which was a .92 (that’s point nine two) out of 5. So they got an ‘F’ in attendance.
The reason this score is so low has to do with the fact that the system does not care if the school got some kind of acceptable number or not. The goal is to locate and punish the bottom 3% of schools no matter how good they are so the metric serves to exaggerate percentages that are below average. So a 92.8% attendance becomes an 18.4% score when they are through with it.
Here’s how the score was calculated: First they calculated the average attendance rate for the entire district and also for the 40 ‘peer’ schools related to P.S. 84. ‘Peer’ schools are the ones that supposedly have similar demographics so schools are judged against other schools with similar kids and also against all schools.
For all schools, the average was 93.6% while for the 40 ‘peer’ schools, the average was 94.5%. So the 92.8% is a bit below the average school, and a bit more below the average of their peers. So how does this turn into an 18.4% out of 100% for attendance?
Well, there’s a statistic in math called the ‘standard deviation.’ This is a measure of how close the numbers in a data set are. The closer the numbers are, the smaller the standard deviation. If I have a class and everyone gets a 90 on a test, the average is a 90 while the standard deviation is 0. If there are a lot of 100s and a lot of 80s, the average can still be a 90, but the standard deviation will be higher, maybe a number like 5. So in the second scenario with the standard deviation of 5, what can we say about a score like 85? Well since it is 5 points below the mean of 90, we say that it is ‘one standard deviation’ below the mean while 80, since it is 10 points or 2*5 points below, we say that it is ‘two standard deviations below the mean.’
The phrase ‘two standard deviations below the mean’ sounds like something that is always bad, but really it is just relative to how big the standard deviation is. If it is a small number, like 1, then it is just the same thing as 2 below the mean, and isn’t all that different from the mean or even from the exalted ‘two standard deviations’ above the mean.
In the attendance example, the standard deviation for the peer schools was 1.1 while the standard deviation for all schools was 1.9. So for the peer schools, the 92.8% was nearly two standard deviations below the mean while it was nearly one standard deviation below the mean for all schools. Big deal, right? Well, actually, for the conversion to the five point scale it is. You see, when you are 2 standard deviations below the mean, you get scaled to 0%. 1 standard deviation below the mean is scaled to 25%. At the mean is scaled to 50%, 1 standard deviation above the mean is scaled to 75%. 2 standard deviations above the mean is scaled to 100%. For the peer groups, this made the 92.8% become 11.4% and for all schools it became 39.5%. Then the peer percent is multiplied by 3 and added to the other percent and then divided by 4 (the peer comparison is 75% of the score and the other is 25%) to get 18.4%, which is then multiplied by 5 points to get .92. (Click on the graphic to enlarge)
So what this type of calculation does is turn anything below average, even if it is just a little below average into something that seems like it is way below average. It then makes it a lot easier to justify the F in attendance. 18.4% sounds a lot worse than 92.8%.
This, believe it or not, is what’s done with all thirteen calculations. None are based on some kind of absolute score that signals that a school met some kind of target. Everything is compared to the average and the schools that are two standard deviations below, with no consideration to how small those standard deviations might be, are slammed with failing grades in that category.
Another extreme example for P.S. 84 is ‘Academic Expectations’ where the tiny standard deviation of .5 caused them to get just .36 points out of 2.5 possible because the peer average and total school averages were 7.9 and 8.1 respectively, while P.S. 84 had gotten a 7.1. So even though they were very close to the average (and the high scores) on this ambiguous metric based on voluntary parent and teacher surveys, they lost valuable points that could have prevented them from getting that F. Only half the parents responded to the survey. It was one of those 5 point scale surveys and nearly all the parents said they either agreed, or strongly agreed that the school had high academic expectations.
Punishing schools that are ever so slightly below average by turning their raw scores that are so obviously close to the mean into scores in single digits, making them feel like complete failures is an awful thing to do and also terrible for morale. Imagine if I, as a teacher, scaled my tests this way. A kid who got a 93% gets it turned into 1n 18% just because everyone did pretty well and the scores were so close together. This is crazy, and, believe it or not, this is only the first and most benign reason that the progress scores are mathematically invalid.
I will examine two other ways in my next two posts coming soon.