What are we really talking about?
In the last post, I laid out the research case for instructional coaching. The effect sizes are real. The impact on teacher practice and student achievement is documented and significant. Coaching outperforms every other professional development model that's been rigorously studied.
But buried in that same body of research — including in Kraft, Blazar, and Hogan's own meta-analysis — is a finding that doesn't make it onto coaching program brochures.
The bigger the program, the weaker the effects.
Kraft and his colleagues found a consistent inverse relationship between coaching program size and coaching program effectiveness. Small, campus-level programs showed dramatic results. Large, district-wide programs showed a fraction of those results. Scale the program up, and you scale the impact down.
That's the scalability paradox. And if you're a coach, an administrator, or a district leader who has invested in coaching and is quietly wondering why the outcomes aren't moving the way the research promised — this is probably a big part of the reason.
What does a typical instructional coach actually do all day?
The research has an answer, and it's uncomfortable.
Kane and Rosenquist (2019) studied how math coaches across multiple districts allocated their time. They found that in some programs, coaches spent as little as one quarter to one third of their time working directly with teachers on instructional practice. One study of a large-scale literacy coaching initiative — a program that explicitly required coaches to spend 60 to 80 percent of their time in direct instructional support — found that coaches were actually spending just 28 percent of their time that way.
Twenty-eight percent. In a program specifically designed to protect coaching time.
The rest? Administrative tasks. Meetings. Data compilation. Coordinating assessments. Substitute teaching. Paperwork. The thousand small things that principals need done and coaches are available to do.
This isn't a personal failing on the part of coaches. It's a structural one. Coaches are campus-based, which means they're visible, accessible, and — in a school that's perpetually short-staffed and over-scheduled — constantly available to fill gaps. And principals, most of whom genuinely believe in coaching, also have 40 other urgent problems before lunch. The coach is standing right there.
A superintendent I know put it more plainly than any researcher: "It's like if you've got 100 broken cars in your backyard and you hire five mechanics, and those mechanics are spending all their time doing something else. You're not getting your cars fixed."
Even when coaches are protected from administrative drift — which is rare — the math of coaching at scale is punishing.
Think about what a single high-quality coaching cycle actually requires. A goal-setting conversation with the teacher. A review of the teacher's goals, prior coaching notes, and relevant student data before the observation. Reading the lesson plan. Sitting through the observation — typically 45 minutes to an hour of class time. Taking and organizing notes. Analyzing what you observed against the teacher's goals. Drafting feedback. Scheduling and holding the debrief conversation. Updating your records so you can reconstruct context before the next cycle.
Do that math honestly and a single cycle — for a single teacher — takes somewhere between two and three hours of a coach's time. Not the hour of classroom observation. The full cycle.
Now multiply that by a caseload of ten teachers. The research recommends cycling back to each teacher every two to four weeks for coaching to be frequent enough to sustain growth. At two to three hours per cycle, per teacher, per month, you're looking at 20 to 30 hours of coaching work — before a single administrative task, PLC facilitation, lesson plan review, or coverage assignment enters the picture.
And most coaches have more than ten teachers.
This isn't a complaint about workload. It's a structural reality that explains why even well-funded, well-intentioned coaching programs fall short. There is simply not enough time in the coaching model we've inherited to do the work the research says needs to be done.

Here's where it gets even harder.
The scalability problem isn't just about time. It's about what happens to quality as programs grow.
The coaching programs that produced transformational effect sizes in the research were small and tightly controlled. Coaches were carefully selected, closely supported, and working with manageable caseloads under conditions designed to protect their time. When researchers have tried to take those same programs to scale — more coaches, more teachers, more schools, more districts — the effects have consistently declined.
The American Institutes for Research spent six years studying this directly. They took a coaching model called MyTeachingPartner-Secondary that had already demonstrated significant positive impacts on student achievement in two separate randomized controlled trials. Then they scaled it — expanding across 49 schools in 15 states. The results? No significant impact on student achievement.
Not diminished impact. No impact.
This isn't a failure of the model. It's a failure of what happens to any complex, relationship-based, human-intensive intervention when you try to run it at industrial scale without changing the underlying structure.
You can't just hire more coaches and expect the results to multiply. The quality of each coaching relationship depends on the coach's time, attention, expertise, and capacity to know their teachers well. All of those things get thinner as caseloads grow. The relationship — which we established in the last post is not a soft variable but the actual mechanism of change — gets harder to maintain. Feedback loops get longer. Cycles get less frequent. The work that made coaching work starts to disappear.
I don't want to end here without saying this clearly: the scalability problem is real, but it isn't permanent.
What the research describes is a structural failure — not a human one. Coaches aren't failing. Principals aren't villains. The model itself is built on assumptions that don't hold at scale: that coaches can protect their time, that relationship quality doesn't degrade with caseload, that a two-to-three-hour cycle can be repeated frequently enough across enough teachers to drive systemic improvement.
Those assumptions only hold in small, protected environments. Everywhere else, the structure collapses under its own weight.
But what if you redesigned the structure?
What if the parts of the coaching cycle that consume the most time — the documentation, the context reconstruction, the note-taking, the pattern analysis — didn't have to be done manually? What if a coach could arrive at a coaching conversation already knowing where this teacher left off, what the data showed last month, and what patterns have emerged across multiple observations — without spending an hour reconstructing that from handwritten notes?
What if the cycle took 55 minutes instead of three hours?
That's not a fantasy. It's a question about what coaching could look like if we stopped asking coaches to be mechanics and filing clerks at the same time.
The next post is about exactly that.

.png)
A landmark Stanford review found that most AI classroom claims aren't backed by rigorous evidence. Here's how to use that finding to make smarter decisions about the tools on your desk right now.
.png)
A run of 2026 research confirms that AI-writing detectors are unreliable and biased. The surprising upside: it puts the most important tool in the classroom — your professional judgment — back where it belongs.