Is Louisiana’s NAEP Miracle ‘Significant’?

In a recent Washington Post editorial, Mitchell Chester and John White, wrote about their successes in turning around low performing schools.  Chester and White are both members of the organization Jeb Bush founded called Chiefs For Change.  Most of the members are actually former chiefs who have left for one reason or another, but Chester and White are currently in leadership roles, Chester in Massachusetts and White in Louisiana.

This Washington Post piece, was, of course, hailed by ‘reform’ zealots like Joel Klein and TFA CEO Elisa Villanueva-Beard

When I think of states that give evidence in any way about the possible positive impact of modern education ‘reform’ (closing down schools, replacing them with charters, evaluating and firing teachers based on a poor measure of ‘student achievement’, etc.), Massachusetts and Louisiana are not two of the states that come to mind.  Massachusetts has always been at or near the top of the rankings.  Based on the PISA scores, Massachusetts, if it were a country, would be one of the top performing countries in the world.  And for Louisiana, except for the statistics that they invent, their test scores on NAEP, the AP, the ACT, and any other sort of standardized test have been very low and continue to be.

So why are we listening to John White, who came to Louisiana from New York where he was mentored by Joel Klein, about how to improve low performing schools?  Well of course he backs up his case with some statistics:

In Louisiana, radical change means that 128,000 fewer students attend schools rated D or F than did in 2011. That’s had a powerful impact on the historically disadvantaged children too often consigned to failing schools, vaulting the performance of African-American fourth graders into the middle of the pack on the National Assessment of Educational Progress in 2015. In 2009, for example, black fourth graders ranked 43rd and 41st in the nation for proficiency in reading and math, respectively. Those rankings jumped to 20th and 23rd in 2015.

As far as the 128,000 fewer students attending schools rated D or F, since they are the ones who assign those ratings and since the criteria for getting a D or F has changed over the years, I don’t take that one too seriously.

But I was interested in ‘fact checking’ that NAEP statistic since that was one I hadn’t heard of before.  I knew that Louisiana as a whole had very low NAEP scores and they were not improving very much over the years the way, for example Tennessee and Washington D.C. have, otherwise we’d be hearing about Louisiana NAEP much more.

White says that black fourth graders ranked 43rd in reading and 41st in math in 2009 and now rank 20th and 23rd.  So I went to the National Center for Education Statistics website and dug into the data.

Since NAEP isn’t just for 4th graders, the first thing I checked was what their current ranking was for black 8th graders and saw that for 8th grade math they actually dropped from 39th to 44th between 2009 and 2015.  For 8th grade reading they dropped from to 43rd to 45th between 2009 and 2015.  So it is obvious why they don’t mention their 8th grade change in rankings.

I also checked how they have done in math for all 4th graders regardless of race.  I found that in 2009 they were 48th while in 2015 they were not much better, at 44th.  In reading they went from second to last in 2009 to 8th to last in 2015.  A jump, but not the sort of thing that John White would ever use to prove his point about his knowledge of improving schools.

But still I could see someone being compelled by the improving position for the scores of black 4th graders since those are students who have had their entire schooling after the ‘greatest thing that ever happened to New Orleans’ (according to Arne Duncan) event, Hurricane Katrina.  I did see that it was accurate that Louisiana had leapfrogged over a bunch of states in the most recent 4th grade tests for black students.  And I could see how it sounds good to go from the bottom to the middle.  But what I wanted to find out is if this was a ‘significant’ change.

In statistics, ‘significant’ has a very specific meaning.  It doesn’t necessarily mean that the change is large.  There can actually be times where a large difference is not considered ‘significant’ and times where a small difference can be considered ‘significant.’  Also, saying that the difference in a comparison is significant (or not significant) has nothing to do with whether or not the difference is ‘important.’  It’s actually a tricky thing to explain what it means for a change to be, or not to be, ‘significant’ but I’ll try to explain.

Suppose you have 100 tomato plants and you give half of them plant food ‘A’ and the other half plant food ‘B’.  A month later you check on the tomatoes and you find that the tomatoes that got plant food ‘B’ grew, on average, 3 inches taller than those that got plant food ‘A.’  Before you can declare that plant food ‘B’ causes plants to grow taller, you enter the data for all 100 plants into a computer.  Then you have the computer randomly select fifty plants from the hundred and you compare the average of this new, random, grouping with the average of the other 50 that were not selected by the computer.  Then you have the computer do that ten thousand times with ten thousand different ways of splitting the 100 plants into two groups.  Then you check to see if it is common for one group to grow, on average, 3 inches taller than the other group.  If it turns out that this actually happens for enough of the groups (maybe 10% or so), the the difference is considered to not be ‘significant’ meaning that the difference was just as likely to be because of random chance than because plant food ‘B’ actually caused the plants to grow more.  In layman’s terms, if the difference is not statistically significant, it’s kind of like a tie.

Well, the NAEP data explorer allows you to create these nifty maps that show how states compare to each other and which are ‘significantly’ better or worse.

Screen Shot 2017-03-11 at 8.42.00 PM

This map shows that in math black 4th graders in Louisiana are not ‘significantly’ different from about 40 states and are better than about 5 and worse than about 5.  So with this pretty weak measurement, it could be argued that Louisiana 4th graders are tied for 45th or that they are tied for 6th.  Basically, there isn’t much that can be concluded when the scores are run through this ‘significance’ filter.

A similar thing happens for 4th grade reading, which can be seen below.

Screen Shot 2017-03-11 at 8.45.52 PM

I also produced these maps for the 2009 NAEP to see how different they were and, as can be seen from these two maps, back in 2009 in reading there were even more ‘ties’ with respect to statistically ‘significant’ differences.

Screen Shot 2017-03-11 at 8.54.54 PM

Screen Shot 2017-03-11 at 8.50.42 PM

I always felt that using NAEP scores as a way to prove that reforms are (or are not working) wasn’t a great idea.  I think the worst ever use of NAEP scores was in ‘Waiting For Superman’ as it was used to show that the public school in every state were pretty much ‘failing.’

Since reformers love to use them when it seems to support their ideas, I feel no guilt when I use them against the reformers when I find things in the NAEP that seem to support the idea that the reform agenda is failing.  If they know that cherry picking isolated NAEP statistics will cause people to dig deeper into the full picture and find many statistics that will be used against them, maybe they will think twice before using them to support their position in the first place.  Not using NAEP against the reformers would be like an attorney not cross examining an unreliable witness who was deliberately chosen by the other side to help their case.

I never thought that NAEP scores were very significant, but I didn’t realize until now how, mathematically speaking, ‘insignificant’ NAEP differences really are.

This entry was posted in Uncategorized.

8 Responses to Is Louisiana’s NAEP Miracle ‘Significant’?

  1. john fager says:

    Interesting blog about Louisiana and the claims of progress by the reformers. Every time I see a mention of Louisiana or New Orleans I always wonder in the change in demographics in New Orleans since hurricane Katrina are factored in. Many of the poor who were able to flee New Orleans could not afford to return even if they owned land and were entitled, in theory, to house rebuilding help.
    You also mentioned Washington D.C. which is another city that has been significantly affected by demographic change in this case by gentrification. Poor black families are being replaced by affluent and educated while families.
    You are very good with statistics. Are you aware of the demographic changes I described and have you ever used them in your analysis of either place? I would enjoy reading what you find.
    By the way we met once at Leonie Haimson’s Skinny Awards dinners.
    John C. Fager

  2. Christine Langhoff says:

    As to Massachusetts, Mitchell Chester is as reformy as can be. Since voters rejected the unlimited expansion of charter by a margin of 2-1, he’s been trying to remove local control and further privatizion with some old wine in new bottles by promoting “Empowerment Zones”. Failing schools, blah, blah. blah.

    Probably a good part of the NAEP results derive from the reality the Massachusetts is affluent and well educated – highly educated, unionized teachers and a cap on charters and zero tolerance for vouchers no doubt have an significant impact, too.

  4. Laura H. Chapman says:

    Your clarity and homey illustrations with gardening should be in a book, and marketed as well as Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. For a brief moment I was reminded of the ridicule-worthy “Oak Tree Analogy,” intended to help teachers understand VAM. It crashed and burned. I hope you can construct and send to Diane a plain language explanation for statisticians who claim charter schools produce X “days of learning” (or months of learning, or years of learning) when compared with public schools. Eric Hanuseck seems to be responsible for that non-sense.

  5. John says:

    So, they went from:
    2009 Math: 4 states sigificantly below, 16 significantly above
    2009 Reading: none significantly below, 17 significantly above
    2015 Math:, 5 significantly below, 5 significantly above
    2015 Reading: 2 significantly below, 10 significantly above
    in 6 years, correct?

    That looks like a pretty substantial improvement for a relatively short period of time:
    – In Math, from having 16 states be significantly better than them to only 5 states significantly better.
    – In Reading, from having 17 states significantly better (and none significantly worse) to only 10 significantly better.

    Please let me know if I have misunderstood this.

    Did you do any analysis on whether their scale scores changed significantly? That seems to be the easier and more appropriate measure than this “how many states did significantly better or worse” metric. It makes me wonder if you looked at that and didn’t like what you saw.

    Even with this method, you started off saying there were 5 above and 5 below this year and then didn’t continue that comparison from old data to new.

    This seems like finding data that fits a premise and then presenting it as overall “meh” performance as opposed to looking at the change. Do you think it’s unreasonable to question your objectivity based on this?

  7. “The more test scores are used for deciding education policies and practices, the more likely that test scores will be subject to corruption pressures – and the more likely this will result in the distortion and corruption of education policies and practices.”

    The data revolution in education has fundamentally failed our children and their parents. Time to move on.

