In a recent Washington Post editorial, Mitchell Chester and John White, wrote about their successes in turning around low performing schools. Chester and White are both members of the organization Jeb Bush founded called Chiefs For Change. Most of the members are actually former chiefs who have left for one reason or another, but Chester and White are currently in leadership roles, Chester in Massachusetts and White in Louisiana.
This Washington Post piece, was, of course, hailed by ‘reform’ zealots like Joel Klein and TFA CEO Elisa Villanueva-Beard
When I think of states that give evidence in any way about the possible positive impact of modern education ‘reform’ (closing down schools, replacing them with charters, evaluating and firing teachers based on a poor measure of ‘student achievement’, etc.), Massachusetts and Louisiana are not two of the states that come to mind. Massachusetts has always been at or near the top of the rankings. Based on the PISA scores, Massachusetts, if it were a country, would be one of the top performing countries in the world. And for Louisiana, except for the statistics that they invent, their test scores on NAEP, the AP, the ACT, and any other sort of standardized test have been very low and continue to be.
So why are we listening to John White, who came to Louisiana from New York where he was mentored by Joel Klein, about how to improve low performing schools? Well of course he backs up his case with some statistics:
In Louisiana, radical change means that 128,000 fewer students attend schools rated D or F than did in 2011. That’s had a powerful impact on the historically disadvantaged children too often consigned to failing schools, vaulting the performance of African-American fourth graders into the middle of the pack on the National Assessment of Educational Progress in 2015. In 2009, for example, black fourth graders ranked 43rd and 41st in the nation for proficiency in reading and math, respectively. Those rankings jumped to 20th and 23rd in 2015.
As far as the 128,000 fewer students attending schools rated D or F, since they are the ones who assign those ratings and since the criteria for getting a D or F has changed over the years, I don’t take that one too seriously.
But I was interested in ‘fact checking’ that NAEP statistic since that was one I hadn’t heard of before. I knew that Louisiana as a whole had very low NAEP scores and they were not improving very much over the years the way, for example Tennessee and Washington D.C. have, otherwise we’d be hearing about Louisiana NAEP much more.
White says that black fourth graders ranked 43rd in reading and 41st in math in 2009 and now rank 20th and 23rd. So I went to the National Center for Education Statistics website and dug into the data.
Since NAEP isn’t just for 4th graders, the first thing I checked was what their current ranking was for black 8th graders and saw that for 8th grade math they actually dropped from 39th to 44th between 2009 and 2015. For 8th grade reading they dropped from to 43rd to 45th between 2009 and 2015. So it is obvious why they don’t mention their 8th grade change in rankings.
I also checked how they have done in math for all 4th graders regardless of race. I found that in 2009 they were 48th while in 2015 they were not much better, at 44th. In reading they went from second to last in 2009 to 8th to last in 2015. A jump, but not the sort of thing that John White would ever use to prove his point about his knowledge of improving schools.
But still I could see someone being compelled by the improving position for the scores of black 4th graders since those are students who have had their entire schooling after the ‘greatest thing that ever happened to New Orleans’ (according to Arne Duncan) event, Hurricane Katrina. I did see that it was accurate that Louisiana had leapfrogged over a bunch of states in the most recent 4th grade tests. And I could see how it sounds good to go from the bottom to the middle. But what I wanted to find out is if this was a ‘significant’ change.
In statistics, ‘significant’ has a very specific meaning. It doesn’t necessarily mean that the change is large. There can actually be times where a large difference is not considered ‘significant’ and times where a small difference can be considered ‘significant.’ Also, saying that the difference in a comparison is significant (or not significant) has nothing to do with whether or not the difference is ‘important.’ It’s actually a tricky thing to explain what it means for a change to be, or not to be, ‘significant’ but I’ll try to explain.
Suppose you have 100 tomato plants and you give half of them plant food ‘A’ and the other half plant food ‘B’. A month later you check on the tomatoes and you find that the tomatoes that got plant food ‘B’ grew, on average, 3 inches taller than those that got plant food ‘A.’ Before you can declare that plant food ‘B’ causes plants to grow taller, you enter the data for all 100 plants into a computer. Then you have the computer randomly select fifty plants from the hundred and you compare the average of this new, random, grouping with the average of the other 50 that were not selected by the computer. Then you have the computer do that ten thousand times with ten thousand different ways of splitting the 100 plants into two groups. Then you check to see if it is common for one group to grow, on average, 3 inches taller than the other group. If it turns out that this actually happens for enough of the groups (maybe 10% or so), the the difference is considered to not be ‘significant’ meaning that the difference was just as likely to be because of random chance than because plant food ‘B’ actually caused the plants to grow more. In layman’s terms, if the difference is not statistically significant, it’s kind of like a tie.
Well, the NAEP data explorer allows you to create these nifty maps that show how states compare to each other and which are ‘significantly’ better or worse.
This map shows that in math black 4th graders in Louisiana are not ‘significantly’ different from about 40 states and are better than about 5 and worse than about 5. So with this pretty weak measurement, it could be argued that Louisiana 4th graders are tied for 45th or that they are tied for 6th. Basically, there isn’t much that can be concluded when the scores are run through this ‘significance’ filter.
A similar thing happens for 4th grade reading, which can be seen below.
I also produced these maps for the 2009 NAEP to see how different they were and, as can be seen from these two maps, back in 2009 in reading there were even more ‘ties’ with respect to statistically ‘significant’ differences.
I always felt that using NAEP scores as a way to prove that reforms are (or are not working) wasn’t a great idea. I think the worst ever use of NAEP scores was in ‘Waiting For Superman’ as it was used to show that the public school in every state were pretty much ‘failing.’
Since reformers love to use them when it seems to support their ideas, I feel no guilt when I use them against the reformers when I find things in the NAEP that seem to support the idea that the reform agenda is failing. If they know that cherry picking isolated NAEP statistics will cause people to dig deeper into the full picture and find many statistics that will be used against them, maybe they will think twice before using them to support their position in the first place. Not using NAEP against the reformers would be like an attorney not cross examining an unreliable witness who was deliberately chosen by the other side to help their case.
I never thought that NAEP scores were very significant, but I didn’t realize until now how, mathematically speaking, ‘insignificant’ NAEP differences really are.