Measurement in educator preparation: purpose matters

“We cannot improve at scale what we cannot measure.” So proclaim Tony Bryk and his colleagues at the Carnegie Foundation for the Advancement of Teaching in their recent book, Learning to Improve: How America’s Schools Can Get Better at Getting Better. Dr. Bryk is one of the leading education sociologists of our time and a committed champion for thoughtful, disciplined, and rigorous improvement in our nation’s schools. And yet, in the political environment that has grown up around assessment and accountability in K-12 education, this statement can easily be taken out of context, seen as a new phase in what Dana Goldstein called the teacher wars. Seemingly neutral terms like “measure” and “improve at scale” are inevitably charged, both now and throughout the history of American public education, dating back at least to the days of Edward Thorndike and John Dewey in the early 20th century.

The politicized history of educational measurement is unfortunate because, as Bryk and his colleagues point out, systems get better through careful attention to measurement – and careful specificity about the purpose of a data tool or instrument. “The validity of a measure is established for some specific set of uses (or consequences),” they write. “Measures do not have the property of being valid in general.”

Bryk and his colleagues cite three main purposes of measurement in education. Each has relevance in our current discussion about measurement in educator preparation.

  1. Accountability: assessing individual, organizational, or sector-wide performance relative to “end of the line” outcomes considered important (e.g., what percentage of teachers prepared by a particular institution remain in the profession 10 years after program completion?)
  2. Research: testing the relationship among key constructs (e.g., what program characteristics might help explain higher retention rates among graduates of a particular institution?)
  3. Improvement: informing efforts to change (e.g., how might teacher-educators better design programs to increase the likelihood of preparing teachers who persist in the profession?)

The key insight here is not that measurement is good or bad. Rather, specificity of purpose matters. And articulating the relationship among the purposes is essential for improvement at scale.

Consider teacher retention. Given persistent concerns about teacher turnover, particularly among teachers of color, it is reasonable to assume that improving teacher retention is a widely valued outcome – one way of examining the impact of an educator-preparation program. Certainly, a variety of factors, including working conditions, instructional support, and life events, influence retention rates. However, both a prospective teacher and an interested observer would presumably want to know if an institution produced teachers with a demonstrably higher retention rate in the profession.

If we were interested only in accountability, we might only collect data on 10-year retention rates. If we were interested only in research, we would need the same retention data, but we would also want additional information on a variety of input measures, such as candidate demographics, in order to isolate causal factors that might explain higher retention. And if we were interested in improvement – a core interest here at Deans for Impact – we would be interested not only in hypotheses derived from research measurement and the outcomes specified in accountability measurement, but also in data from proximal activities believed to be associated with increased retention. For example, if the quality of a clinical placement appears associated with increased long-term retention, teacher-educators would want to collect carefully constructed survey data from student teachers and engage in iterative cycles to drive improvement in clinical partnership quality.

Data-informed improvement along the lines described above isn’t rocket science. However, it is very difficult to carry out in our current climate for at least two key reasons. First, the data environment for accountability and research measurement is weak in most states. The field lacks inter-state teacher-level identifiers, meaning that if I am prepared in one state and leave to teach in another, my preparation program and original state have no way of tracking my retention. Further, each state has its own way of describing programs and program characteristics, a problem compounded by differences in descriptions among institutions even within the same state and by gaps between higher-ed and K-12 reporting systems. All of these factors complicate the efforts of researchers. Simply put, there are technical barriers to be surmounted, if only we can summon the political will to surmount them. This is a priority at Deans for Impact.

Second, there are cultural challenges to improvement measurement – not least of which is skepticism among many educators about the very concept given its tumultuous and politicized history in American education. In K-12 public schools, educators have been asked to track and analyze all types of data, sometimes without specificity about the purpose and often without clarity about the connection between data analysis and improvement for kids – a compliance exercise dictated from the central office or the state capitol, not a genuine exercise in getting better. Data work poorly done may be worse than no data work at all. That’s why at Deans for Impact we will be highlighting examples of improvement measurement within member institutions and working with teacher-educators at these institutions to change the way they approach and use data for improvement – starting with the outcomes valued by institutions and examining proximal process measures believed to influence those outcomes.

This will not be easy, but we cannot improve at scale what we cannot measure.

Peter Fishman

Vice President of Strategy

All Blog Posts
Subscribe to Our Newsletter