Is a half year of learning equivalent to one question on a multiple choice test?

I read an article yesterday about a new study which ‘proves’ how effective TFA teachers and TFA alumni teachers are.  The study, which can be accessed here, claimed that middle school math TFA corps members get an extra half year of learning than non-TFA novice teachers, that middle school reading TFA alumni get an extra half year of learning than non-TFA experienced teachers, and that middle school math TFA alumni get an extra full year of learning than do their non-TFA experienced teacher counterparts.

Anytime I see some educational program being measured in ‘months of learning’ I get nervous.  I just don’t think that learning is measured in months, and even if it were, I don’t think that a short multiple choice standardized test would be able to accurately measure it.

When I think of my own math teaching, my goal is to expose my students to as many ‘thought provoking’ questions each class as I can while still developing their skills.  Maybe on a good day, with the right subject matter, I can get four or five really good questions that get the students thinking deeply and analyzing what is going on.  [An example:  If you fold a piece of paper it gets twice as thick.  If you fold it again, it will be as thick as four sheets of paper.  How many times would you have to repeat this for it to be as thick as the distance from the Earth to the Moon?  (Answer:  Way fewer than you think.  Around 40.)]  The purpose of these thought provoking questions is to try to evoke as many ‘aha’ moments as possible.  These are the moments where math students make a connection that makes them want to jump up and yell ‘Eureka!’

So this paper was trumpeted by many people on Twitter including Wendy Kopp

So I downloaded the paper to see what it really said, and what it didn’t.  What I learned is that there were eight different groups that were compared in the study:  1) Elementary TFA corps math teachers to non-TFA novice math teachers, 2) Elementary TFA corps reading teachers to non-TFA novice reading teachers, 3) Elementary alumni TFA math teachers to non-TFA experienced math teachers, 4) Elementary alumni TFA reading teachers to non-TFA experienced reading teachers, 5) Middle school TFA corps math teachers to non-TFA novice math teachers, 6) Middle school TFA corps reading teachers to non-TFA novice reading teachers, 7) Middle school alumni TFA math teachers to non-TFA experienced math teachers, 8) Middle school alumni TFA reading teachers to non-TFA experienced reading teachers

Of those eight comparisons, for five of them the standardized test score comparisons were not ‘statistically significant.’  But for the other three, well, they had the miracle where two groups outperformed their counterparts by a half year of learning while one group, the alumni middle school math teachers, outperformed their counterparts by a year of learning!  Sounds impressive.

So I went through the report and learned in two footnotes on pages 62 and 64 that what most people think of as a ‘half year’ of learning and a ‘year’ of learning isn’t quite what they are talking about here.

For the middle school math corps members, the half year of learning was based on the students getting two extra questions correct out of about 40 on the TAKS test.

The half year of extra learning that the alumni middle school reading teachers got was because the students got one extra question correct on the test.  And the amazing year of additional learning accomplished by the alumni math teachers was because they got three extra questions correct on the exam.

I think that TFA needs to back off on the miracle stories.  The fact is that new TFA teachers are not much better, if they even are any better, than new non-TFA teachers.  Neither are that good, really.  Teaching has a big learning curve, but by the time you figure it out, you generally have to wait until next year to have a fresh start with a new group.  As far as alumni teachers, yes, I think they are generally pretty good.  I’d let an alum teach my kids.  But as good as they might be, to ignore the fact that most of the comparisons were pretty neutral and then buy into the idea that when one group of students learns a year more than another group, they will only get, on average, three more questions correct on a multiple choice math test, well that’s the kind of thing that is going to keep me investigating these kinds of claims and spreading the word.

This entry was posted in Teach For America, TFA teacher effectiveness. Bookmark the permalink.

23 Responses to Is a half year of learning equivalent to one question on a multiple choice test?

  1. Quite a revelation, Gary. Well done!

  2. Educator says:

    The tweet should instead say “Students of TFA CMs sometimes get, on average, 1 to 3 more multiple choice questions correct on a 40 multiple choice question test! Sometimes not!”

  3. l hodge says:

    On pg 53, you see that the TFA schools had much higher poverty rates than the non-TFA schools (90% vs 60%), but similar percentages meeting standard on TAKS (biggest difference was 79% vs 83%).

    This could be further evidence of the success of TFA teachers. Or, this could be an indication of other differences between the TFA & non-TFA schools. For example, how many TFA were teaching at KIPP (45% more time in school) or at other schools with extended school days?

    • Educator says:

      Good point.

      If it’s true that these teachers are at higher pover schools, I’d ask what kinds of higher poverty schools? If they’re at some charter schools, are they educating SPED and ELL students?

      How does the study account for teaching these different kinds of students? In other words, how do they adjust for a teacher who has to teach all students vs teaching students who are self selected (by applying for and winning a lottery), and having fewer SPED/ELL, and having higher kick out/dropout/counseled out students?

      • gkm001 says:

        The study attempted to match student demographics based on SES and previous year’s test scores. Student data was not included for students who had accommodations (linguistic or other) for the test, so they screened out many special education students and ELLs. But again, there seems no attempt to control for the effects of the school itself: for example, the effect going to school with other children who all had parents willing to enter them in the enrollment lottery, or the effect of going to an extended-day or extended-year school.

    • David Eckstrom says:

      Wait, I thought poverty didn’t matter.

    • gkm001 says:

      Right, and what’s interesting is that they do a matched comparison for students only after first doing a matched comparison for TFA and non-TFA campuses. This ensures that none of the non-TFA teachers in the study were teaching at schools with TFAs in them. So how are we to know that the findings are due to a TFA teacher effect and not a school effect (for example, the effects of longer school days or a longer school year at TFA schools, all of which were charter schools)?

  4. Kuhio Kane says:

    Thanks for the info., Gary. And I’m sure you noticed in Wendy’s tweets that she compared her one and three extra right answer “teachers” to … other teachers of similar experience, and other new teachers. This misleads many readers to assume the comparison is between TFAs and regular teachers who actually went to a post-grad teaching college? Or do am I mistaken? Either way, I see a lot of TFAs. Some who stay long enough to complete their Masters at the local university often do remain in our schools and are doing well. Most split the scene with their hair on fire, especially if they are placed in our most “needy” schools.

  5. Cosmic Tinkerer says:

    Similar to the Mathematica study on KIPP reported recently, this Edvance Research study on TFA is not published in a peer-reviewed journal and it says on the report, “This evaluation was made possible through funding provided by Teach For America.” (If you look at the Edvance website, among others, they list KIPP under “Partners,” too.)

    I share concerns noted by other posters regarding comparisons of teachers in schools with extended days, weeks and years, as in KIPP schools.

    I did a cursory review of the report and saw grade level equivalents reported. People tend to misinterpret grade level equivalent scores, similar to how Wendy reported the findings –but they are not equal interval metrics– and experts have advised against using them for decades, due to these issues.

    Additionally, when students have been retained in grade, as often occurs in KIPP and other charter schools grade level equivalent scores are even more questionable, so age level equivalents should be used instead. See, “Age vs Grade Based Scores” I looked but don’t see age equivalents reported. Did you, Gary?

  6. Cosmic Tinkerer says:

    BTW, it’s been reported that, at least in some areas, KIPP requires a “pre-lottery test” that is used to determine grade level, suggesting that some students may be retained in grade at enrollment. The Mathematica study reported a “comparatively high rate of students retained in grade 5 at KIPP schools” (p. 13).

    Gary, have you analyzed that report?

    • Meg says:

      As long as the test isn’t used to deny admission I don’t see a problem with it. It’s relatively common knowledge that many inner city public schools pass children along who have no business being in the next grade. If a child is old enough to be in 6th grade, but reads at a kindergarten level, I don’t see any issue with a school telling that child’s parents that he/she needs to repeat the 5th grade.

      • Cosmic Tinkerer says:

        That’s not the concern that was raised here. The issue is that you cannot compare classes at schools which rarely retain children in grade with schools that often retain them and say they are equivalent, since the latter kids are older and have had the opportunity to cover the content twice. It’s not an apples to apples comparison. Scores should be disaggregated by age and how many children were repeating the grade should be indicated.

      • Steve M says:

        A lot of inner-city parents would never consider enrolling their children in charter schools that require a pre-test…if that test were then used determine whether or not to hold kids back. For that matter, MOST parents of kids who are under-performing would not consider that option. So, this pre-test becomes a selection mechanism.

  7. Pingback: Time for TFA to Stop the Spin | Diane Ravitch's blog

  8. Even if new TFA teachers get more gains than new regular teachers, and alumni TFA surpass teachers if a similar experience level, if they are more likely to leave after one or two years, that probably means the average TFAer is less effective than the average regular teacher no?

  9. Gabriel says:

    Similar small differences in test scores produce the great grade differences in school report cards, especially if you are at the top of the scoring range. This is part of the problem with these measures – uneven sensitivity actions the range of measures and particularly high sensitivity (instability) at the top range. My school went from an A to a D on just 1-2 pt differences in Ela and math test scores (we wondered if the insignificant drop drop was due to the flu epidemic we were experiencing!)

  10. Tim says:

    In the recent CREDO report for NJ charters, the reformer paid study stated that :
    “The data is analyzed in units of standard deviations of growth so that the results
    will be statistically correct. These units, unfortunately, do not have much meaning
    for the average reader. Transforming the results into more accessible units is
    challenging and can be done only imprecisely.
    Therefore, Table3 below, which
    presents a translation of various outcomes, should be interpreted cautiously.
    Hanushek, EricA and Steven G.Rivkin. Teacher quality. In Handbook of the Economics of Education,
    Vol. 2, ed. EA Hanushek, F Welch,(2006):
    1051–1078. Amsterdam: North Holland”

    Yet, even though they have this disclaimer that this is IMPRECISE and should be used with caution they use it anyway and make this their main point of the study.

    Again in this study the “reformers” are attempting to put their spin on things!

  11. E. Rat says:

    Hopefully this will draw attention to just how ridiculous these tests are. One more right answer is a big deal because the tests are so short. Despite taking a week to administer, the California STAR tests are around forty questions in total. Enormous consequences rest on how one or two children answer a single question. Distilling a year down to a single multiple-choice exam like this is insanity.

  12. Ed Fuller says:

    There are even more substantial problems with the paper. The regressions utilized rely on scale scores. yet, even controlling for prior scores, lower scale scores are highly correlated with student growth. The fact that TFA taught students started with lower scores could very well likely account for the minuscule, but stat sig growth in the study. I know because I have worked with the same data and after repeated efforts to disentangle prior scores and growth, it was impossible to do so. It is simply a function of how the tests are scaled and a ceiling effect.

  13. Ed Fuller says:

    My above post was in reference to the middle school novice teacher scores.

    If students were matched on scores, why are the average scores different? Shouldn’t the average scores be the same if students were matched on scores first in the propensity score matching? If scores were not the first criteria, then the scale scores should have been converted into z-scores. In addition, student-level controls were not included in the model (race, ethnicity, age, etc) nor was a binary variable included for extended year schools. Finally, there was no effort to control for peer effects or the effects of test participation. If TFA teachers are more likely enrolled in charters such as KIPP and IDEA and those schools retain students in grade at higher rates than public schools–this excluding such students from the sample–then the results would be flawed.

    • Sam Jones says:

      If anyone could understand how to inappropriately use Texas student data, it would be Ed Fuller: he’s been doing it for years!

  14. Pingback: Dear TFA: You Don’t Need to Keep Telling Us How Excellent You Are (Because We Already Know) | EduShyster

  15. jcg says:

    It’s clear Wendy Kopp fails to understand the meaning of standard scores relative to growth. Shouldn’t she be embarrassed?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s