Education

Measuring teachers

Published 7.16.2017
Not all teachers are equal, some are better than others. But as the system is designed now, advanced degrees, certifications and time in service (tenure) are how teacher compensation is decided. It’s very difficult to get rid of poorer teachers.

Students aren’t widgets, but William S. Sanders found a way to measure teacher performance by considering the student’s progress as defined against that student’s expected progress. Therefore, if my son wasn’t expected to be reading by the end of third grade, the fact that he wasn’t wouldn’t be held against the teacher.

I don’t think the school grading system that was such a disaster was based on this.

To fairly evaluate teachers, Mr. Sanders argued, the state needed to calculate an expected growth trajectory for each student in each subject, based on past test performance, then compare those predictions with their actual growth. Outside-of-school factors like talent, wealth and home life were thus baked into each student’s expected growth. Teachers whose students’ scores consistently grew more than expected were achieving unusually high levels of “value-added.” Those, Mr. Sanders declared, were the best teachers.

Read the rest. This required computing power, but Tennessee gave him what he needed.

When he began calculating value-added scores en masse, he immediately saw that the ratings fell into a “normal” distribution, or bell curve. A small number of teachers had unusually bad results, a small number had unusually good results, and most were somewhere in the middle.

A bell curve doesn’t mean that all teachers should get raises— which conflicts with what the teacher's union thinks.

The value-added bell curve told a different story. First, it was wide. The effective teachers on one side were achieving much better results than the ineffective teachers on the other. Second, it didn’t support the tenure and credentials system. Other researchers began using methods similar to Mr. Sanders’s to compare different kinds of teachers.

Schools were collectively spending billions to give teachers with master’s degrees extra pay. Yet their value-added bell curve looked little different from the curve for teachers without those degrees. Nor did effectiveness grow in lock step with years of service.

Teachers oppose releasing the value-add data because the poorer teachers will be exposed. Sanders didn’t want the data released publicly. And he made no bones about the fact that most teachers didn’t understand the math behind what he was trying to do.

The value-add metric also argues against focusing on smaller class sizes or broad based salary increases, because those assume that all teachers are basically the same and that is not the case. Race to the top was supposed to be based on the value-add scores

The policy quickly became a flash point. The Obama administration wanted a substantial portion of each teacher’s rating to be based on “student growth,” which everyone understood to mean some form of value-added results. The unions wanted test scores to matter much less. The Common Core standardized tests, already disliked by opponents of federal power on the right, also gained critics on the left, who objected to their use in evaluating teachers.

In the end, they focused on in person evals, rather than test scores. True, the quality of test matters, but I think a decent test can be developed. Mostly this is teachers protecting their own.

Up until his death, Mr. Sanders never tired of pointing out that none of the critiques refuted the central insight of the value-added bell curve: Some teachers are much better than others, for reasons that conventional measures can’t explain. His system is still used in Tennessee today. In the last dozen years or so, the state’s scores on federal N.A.E.P. exams have improved faster than those of the average state.

There still isn’t a way to judge who is a good teacher and who is not and why, but TN use of Sander’s data shows that improvement can be gained in the data is used. I’m not suggesting teachers be ousted after one bad rating year, but repeated sub-average years should in fact, be a sign that a new career needs to be considered.

Left unanswered: How is it determined what the expected growth for each student should be?