## The Vindication Of P.S. 84 Part III

In the first two parts of this series I examined how the New York City progress report system is less about measuring a school’s quality and more about comparing schools.  The number of comparisons that occurs to determine the ‘progress’ score is staggering:  Students are compared to other students with the same starting score.  Students within the same school are compared to find the student who has the middle ‘progress’ score which becomes the schools ‘progress’ percent.  Schools’ progress percents are compared so that schools that have below average ‘progress’ are penalized harshly, even if their scores aren’t very much below average, because they are ‘two standard deviations below the mean.’

One problem with school comparisons is that it might not be fair to compare two schools with completely different demographics.  The DOE realized this which was why each calculation is done two times.  Once, counting for 25%, the school is compared against all schools.  But to make it ‘fair’ 75% of the scores are based on comparisons to just 40 schools known as the schools ‘peer group.’  So a school that has obstacles of poverty and special needs students will get compared to other schools with similar demographics to make up the majority of each score.  Sounds fair enough, right?

In theory this is a good idea.  It requires, though, a very accurate way to quantify the demographics of a school in a single number so the peer groups can be formed.  This number is called, in New York City, the ‘Peer Index’ and it is a number between 0 and 100, which I’ll describe in more detail later.  If this calculation is not statistically sound, a school may get placed into an inappropriate peer group and then get punished harshly as it can’t compete.  This leads me to the third, and most critical, flaw in the progress report calculation:

Flaw #3:  Oversimplified ‘Peer Index’ calculation causes some schools to be placed into an inappropriate peer group making an ‘F’ almost certain.

The entire system will be compromised if they have an unreliable way of calculating the peer index.  There are so many sophisticated calculations throughout the process, I’m amazed that the vital peer index calculation is simply:  % eligible for free lunch * 30 + % students with disabilities * 30 + % Black / Hispanic * 30 + % English Language Learners * 10

Now, if two schools have exactly the same demographics, they will have the same peer index and be in the same peer group.  But it is also possible to be in the same peer group with different demographics if the numbers add up the same.

For my examples, I’m going to assume 0% disabilities, and 0% ELL, since those numbers are generally low in relation to the other two.  So if a school has 100% free lunch and 100% Black / Hispanic, they would have a peer index of 60.  A school that has 25% free lunch and 75% Black / Hispanic would have a score of 50 as would a school with 75% free lunch and 25% Black / Hispanic.  Now one has to ask if this is the ideal way to calculate these peer indices.  Why is 30% of the score the Black / Hispanic percent and 30% the free lunch percent?  Why not 50% Black / Hispanic and 10% free lunch?

To see the difficulty of this, ask yourself this:  School A has 70% Black / Hispanic and 40% free lunch.  School B has 95% Black / Hispanic.  What percent of free lunch should school B have to make these two schools ‘equivalent’ and comparable?  Is it definitely 15%, as the DOE assumes?

Well, when it comes to P.S. 84, they had 53.7% free lunch, 15.6% with disabilities, 73.4% Black/Hispanic, and 9.3% ELL for a peer index of 43.73.  P.S. 095 in their peer group (who got a B) had 77% free lunch, 9.6% disabilities, 51.2% Black/Hispanic and 33.6% ELL for a 44.70 peer index.  The schools were similar except for the Black/Hispanic and free lunch percents were swapped.

So the question is still:  Is this a ‘fair’ comparison?  In the extreme, is a school that has 100% Black/Hispanic and 0% free lunch the ‘same’ as a school that has 100% free lunch and 0% Black/Hispanic?

Well, to try to answer that question, I looked at the final scores for P.S. 84’s peer group and saw that only two of them got Fs:  P.S. 84 and Merrick Academy Charter School and one got a D, P.S. 35  Then when I looked at the demographics, I noticed something interesting:

Click on image to enlarge in a new window:

The two Fs and the D were three of the four schools with the highest Black/Hispanic percentages.  So if there was a different more sophisticated weighing of the demographic factors, perhaps the Black/Hispanic percentage would count more in determining the peer index.  Perhaps there are other factors that could be included to make the index more accurate.  It is hard to know how to fix this, but the misplacement, if P.S. 84 was misplaced in too high a peer group, as I think, makes the F grade inevitable.  The lack of thought into creating this pivotal calculation is very bizarre.  Its like building a car and making the engine out of toothpicks and gum.  It invalidates the entire thing.

Now P.S. 84 has an unusual circumstance which is that their demographics have recently changed as their school became more ‘gentrified.’  So the younger students have a different demographic mix than the older kids.  But the older kids from grades 3 to 6 are the ones who are taking the tests on which the student performance and student progress grades are calculated.  The school environment grade is based on all the students, but one change that could be made to this system (it would still be extremely flawed, just less so) would be to have a different peer group for the other two calculations based on the demographics of the test group.  This would have helped P.S. 84 avoid the F, and this was the main reason the principal cited in the article I read.

I recalculated what the progress score for P.S. 84 would be if they were in the peer group with an average peer index of 54 (their peer index from three years ago) instead of 44.  In that case, the scores were as follows: Progress 11.6 ‘D’, Performance 6.6 ‘C’, Environment 3.7 D, Overall 21.9 D.

Now a D is still a pretty low score (they couldn’t recover from the 45% math progress despite score increases and percent proficient increases discussed in part II), but it is a lot better than an F.  This is just an example of how the peer group can make a big difference, especially when the difference between an F and a D can be the difference between getting more resources or being shut down.

In conclusion, the school progress reports are not mathematically valid enough to be used for decisions to shut down schools.  To give another example of how inaccurate these scores are, the scores for high schools were recently released.  The three top high schools in the city, Stuyvesant, Bronx Science, and Brooklyn Tech were rated, by this metric, number 97, 70, and 142.  Stuyvesant, where I teach, is generally considered to be one of the top schools in the country.  How New York City can rate it the 97th best high school in the city is reason alone to ditch the metric.

The worst thing about the progress scores is that they seem to be set up to make that predetermined 3% of schools that are going to get Fs get them with extremely low seeming scores.  When someone has to be punished, even though everyone does well, it doesn’t give schools any way of improving.  Also, with all that competition, there is even more incentive to cheat or teach to the test.  If your ‘peer’ school is doing intensive test prep, you have to also since gains aren’t enough, they have to be more than your peers.

Another thing is that the ‘environment’ grade based on parent surveys is something that can easily be manipulated.  The principals get the forms and they don’t really have to give them to the parents.  I have heard of cases where principals have filled those out for themselves, guaranteeing them nearly 15 points of environment (out of the F range already) while simultaneously hurting their ‘peers.’

My hope is that my series of three posts will get widely read and studied so professional journalists with a lot more time than I have to work on this kind of thing will delve more deeply into what I’ve begun.

This entry was posted in Research, Tom. Bookmark the permalink.

### 4 Responses to The Vindication Of P.S. 84 Part III

1. Thank you again Gary. Days ago I looked at the Gompers peer group and scratched my head, the schools it’s matched with are not a good comparison. Why would a Middle School be in the Gompers peer group? Why would a new, phasing-in school be in the peer group? Why would a school for the deaf be in a peer group with a CTE school?

2. Tom Hoffman says:

My hypothesis after looking at the peer groups of several “no excuses” charters is that for them having nearly 100% minority populations and less poverty, special ed and ELL is probably an advantage overall.

3. Robert Reid says:

Well there are lies, d#&* lies and statistics. Really nice article and a great dissection of the numbers that go in to the score. I am curious about the attendance numbers due to the fact that the standard deviations (SD) seem quite small for such a large and diverse school system. But even more curious is that this is really an improper use of the concept of variance on the part of the DOE. From an analytical point of view, one might use the average and SD to determine if the PS84 attendance number is in fact different from the average. The standard deviation allows one to say with some confidence that a number is different if it is more than 2 SD away from the average in either direction. Within 2 SDs from the mean, one typically does not conclude that the numbers are different. Using these numbers, I would not conclude that they are in fact different from the average of the system or the peer group.

full disclosure: I am a PS84 parent, and my children are thriving at this wonderful school.