IMPACTed Wisdom Truth?

The big news in New York today is the release of the Teacher Data Reports (TDRs) for some 18,000 teachers.  These were ratings that the city calculated using a value-added model.  The New York Times is going to publish all the scores, including the teachers’ names, on their website.

Much to the surprise of most people following the issue, both Bill Gates and Michelle Rhee’s StudentsFirst expressed that they did not think it was a good thing to release this data.  Wendy Kopp also opposed the release of this data, as she did a few years ago when they did a similar thing in Los Angeles.  Most people who are opposed to this are because of the known error rates.  Gates and Kopp don’t mention those when they say the data should not be made public.  They just say that it is just one measure and that shaming teachers will kill morale.  They can’t mention the error rates since that would destroy one of the pillars of the reform movement — that teachers must be held accountable if they do not get those test scores up.

Today, the day of the release of the New York City data, I received an email that I did not expect to come for at least a year.  In D.C. the evaluation process is called IMPACT.  About 500 teachers in D.C. belong to something called ‘group one’ which means that they teach something that can be measured with their value-added formula.  50% of their evaluation is based on their IVA (individual value-added), 35% is on their principal evaluation called their TLF (teaching and learning framework).  5% is on their SVA (school value added) and the remaining 10% on their CSC (commitment to school and community).  I wanted to test my theory that the value-added scores would not correlate with the principal evaluations so I had applied under the Freedom Of Information Act (FOIA) to D.C. schools requesting the principal evaluation scores and the value-added scores for all group one teachers (without their names.)  I fully expected to wait about a year or two and then be denied.  To my surprise, it only took a few months and they did provide a 500 row spreadsheet.

I made a scatter plot with the principal TLF on the x-axis and the value-added IVA on the y-axis.  What came up confirmed my expectation that these would not correlate.  It looks like a blob or, maybe, with some imagination a Buffalo facing to the right.  There is a slight upward trend, but the correlation coefficient is only .35, which is quite low.  Here is the scatter plot.

Perhaps this is why they recently announced they are changing the IMPACT process.  It will be too late, of course, for all the teachers who were fired over this and also for the teachers who voluntarily left the system rather than let 50% of their evaluation be left to random chance.

This entry was posted in Research. Bookmark the permalink.

11 Responses to IMPACTed Wisdom Truth?

  1. Matt says:

    What does this really tell us? That one is more “right” than the other? I could interpret that as saying either principals don’t know much about evaluating good teaching, or tests don’t measure good teaching. Seems to me somewhere in the middle.

  2. Sean says:

    This data has been available
    here for a while.
    Four points in response:
    1. Because value-added and principal observations suffer from measurement error and purport to measure the same construct (effective teaching), we should only expect modest correlations. Keep in mind that principal observations only correlate .57 with Master Teacher evaluations in the TLF and they are using the exact same measurement instrument.

    2. What is currently driving district decisions around pay and teacher assignment? Master’s degrees and teaching experience, the latter of which is correlated at about .1 (at most) and the former not at all with student achievement. Value-added provides better information than either of these measures on the student achievement question. That is without dispute.

    3. As to your principal observation comparison, fair enough. Except that value-added does not provide any diagnostic data (and is not designed to do so). The TLF framework gives teachers specific feedback on how to improve. Thus, while they only modestly correlate, they certainly do not contradict.

    4. Lastly, I suspect that a teacher’s career value added would correlate stronger with their career principal observation scores (provided the same principal is giving the ratings). One year is a noisy measure on both accounts.

    • Gary Rubinstein says:

      My data is from the 2010-2011 school year while this report came out before the 2011 school year was over, so the information I used was not available there. Also, I have the raw data of each teacher including all four aspects of the evaluation for each teacher, not just a summary. My info enabled me to create a scatter plot which I wouldn’t have been able to do without this raw data.

  3. Sean says:

    To add one more point:

    5. “I fully expected to wait about a year or two and then be denied. To my surprise, it only took a few months and they did provide a 500 row spreadsheet.” Googling “impact tlf value added correlation” yielded the results in .22 seconds.

  4. C says:

    I’m a 1st year teacher teaching 4th grade. I’m sure this point has been made before, but I just want to reemphasize. A student could learn much in a year, but have a bad test due to many factors. I have a student who has been doing very well, but her step-dad just killed two people in a possible DUI incident. He is now in jail. Mom works nights. The family has also lost his income. This student’s life is in shambles, and as a result the student’s grades have tanked over the past couple weeks. Our state test is in 2 weeks, but I’m afraid this student is going to perform poorly now. Should my job be on the line for this unfortunate situation? My cynical side thinks that the Ed-reformers would say it is my job to make the case to the student that despite the family problems this test is über-important. What kind of message is that?

  5. efavorite says:

    Thanks for this. By the way, unless I’m missing something, what you call “principal” scores are not just from the principal, but in many cases an assistant principal and always two ratings done by “master educators” – central office employees whose sole job is to go from school to school rating teachers.

  6. Amy Hogan says:

    Gary, well done. Let me know if you would like me to take another (statistical) look at the data.

  7. Pingback: Remainders: Parents, teachers, Michiganders respond to TDRs | GothamSchools

  8. John Thompson says:

    The issue, I argue, is whether a good teacher in an ineffective school (which are disproportionately low income) can be expected to raise student acheivement as much as an equally good teacher in an effective school. Otherwise, value-added evaluations are collective punishment for teachers who commit to the toughest schools.

    These Tulsa scattergrams tell a story that should knock you on your rear end.

    Go to the last scattergrams on three years of value-added.

    The Tulsa scattergram tells the stories of two halves of their district. On the top is schools with high achievement. When you see all those schools in the upper right hand corner, that’s evidence of schools that are working. You have high achievement and high growth of test scores. The teachers in the upper left hand corner who started with high scores, but did not improve as much, are in no danger. They will be disproportionately evaluated down on 35% of their evaluations for lower test score growth but, as I understand the Oklahoma lawl, they will get evaluated up on 15% for having high achievement, so good teachers in higher performing schools in Tulsa are not threatened with being fired due to statistical error.

    The important policy issue of the top half of the scattergram is the ratio of high performing schools that improved as opposed to those that didn’t. Since Tulsa includes three years of data and so many schools, the total ratio in the upper right hand (36 ) to those in the upper left hand (12) is a strong piece of evidence that teachers in high-performing schools have an advantage in raising test scores. For every three higher performing schools that raised scores higher than average, there was only one which did not.

    There is a large body of cognitive science that explains why. The “Matthew Effect” shows that kids who “learn to read and then read to learn” will continue to improve but we still have no evidence that secondary teachers, systematically, can improve the reading scores of kids who did not learn to comprehend what they read in elementary school.

    So, the real story is the bottom half of the scattergram. “No Excuses” schools and other charters have had success in raising math scores but not reading scores. So the area where value-added has the most validity is elementary school math. Even there, the ratio of low achieving schools that showed improvement is the opposite of the ratio on the top half of the scattergram.

    There are two distinct explanations why higher performing schools were more successful in raising performance still higher. One possibility is that the better schools have better teachers. The other is that the value-added model is systematically unfair to lower performing schools. Real world, both have to be true. The question is how many good teachers get caught in the value-added net being used to identify bad teachers. And when you look at the stark differences between the two ratios I’ve cited, common sense says that the unfairness has to be a big factor even in elementary schools. Even with the 27% of 5th grade math teachers that the first Tulsa value-added experimented determined needed to be fired, many must have been mistakenly identified.

    So, when you look at the (most reliable) elementary math results, the ratio between low performing schools that had low growth (17) to low performing schools that had higher growth (11), it is hard to believe that they are based solely (primarily?) on real differences in teacher quality. This is especially true when considering that when it came to Reading, where the ratio between low performing low growth schools to high growth schools was 23 to 11, or more than two to one.

    But here’s the kicker. When you get to the very different world of middle school, again using a huge data base of three years, and the most reliable data of all middle schools for all subjects, the ratio knocks you socks off. When it comes to the high performing half of the scattergram the ratio is that for every nine higher performing schools that had higher growth you only had one with low growth! Among lower performing schools, 14 had low growth and only four had higher growth for the opposite ratio of 3.5 to one.

    Then the same ratio for all low performing high schools in all subjects is that 13 had lower growth and four had higher growth, for a ratio of more than three to one.

    And finally the ratio for Reading and History (which requires reading) is that nine low performing schools had low growth while only one low performing schools increased student growth, and that school did so by just a smidgen.

    The only way that a nine-to-one ratio can reflect actual teaching performance is that if the overwhelming majority of English and History teachers, in contrast to the peers in higher performing school (which in History for high achieving schools was 100% to 0) who are mostly effective, are slugs.

    What we cannot allow is the presumption that all of those teachers need to prove themselves innocent of the charge brought by the value-added model that they are ineffective.

    If you want to look more closely, you can get more checks on my analysis. If value-added was a valid way of evaluating teachers, then most or all schools would look like Memorial, being either a little or a lot above or below average in test scores and test score growth. But look at how consistent the patterns are with Edison and Washington consistently getting both high scores and high value added, and Hale and East Central almost always getting low in both. Similarly, Rogers, Central, and McClain, in that order, are usually low in both, and only occasionally having growth that was significantly higher. By the way, I didn’t count schools on the dividing lines.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s