When a new technology is introduced into schools, Silicon Valley experiences a certain level of confidence. Transformation, personalization, equity, and efficiency are all quickly promised. The slidedecks appear tidy. The pilot programs seem promising. Almost without hesitation, the money follows. Therefore, it didn’t make headlines the way it probably should have when Stanford University researchers discreetly published a report this spring that suggested much of the evidence behind today’s AI classroom tools is thinner than the industry lets on.
Over 800 scholarly articles on AI in K–12 education were evaluated by the Stanford AI Hub for Education. Only 20 studies were found to be rigorous enough to determine whether AI actually improved student outcomes, not just whether students who used AI performed better. It’s not a footnote. The conclusion is that.
According to Lily Fesler, a senior researcher at the Hub and co-author of the report, surveys and descriptive results can tell you that students who used a tool scored higher, but they cannot explain why. Perhaps it was the instructor. Perhaps it was inspiration. Perhaps it was just the novelty of something fresh. A more difficult question is posed by causal research: did the tool itself cause the change? — and there are currently very few studies that are intended to address it. Based on evidence that, in the researchers’ opinion, is essentially nonexistent, school district administrators are writing procurement checks.
Nevertheless, something is functioning. Fesler was cautious not to completely discount the technology, as was her co-author Chris Agnew, managing director of the AI Hub. In structured, task-specific scenarios, such as a student working through math practice problems at 10 p.m. and receiving instant feedback on each step, AI does appear to truly help. Or a teacher who writes the same comments on thirty nearly identical essays in less time. According to the study, some AI tools can reduce grading and lesson planning time by up to 30% without compromising the quality of the lessons. That is true. That is important. However, Fesler also pointed out that teachers aren’t really clocking out earlier; rather, they are using that time for other aspects of their work, which highlights the importance and difficulty of quantifying what “saving time” in a school setting actually entails.

What happens when students close the app is the more difficult question. A pattern emerged in a number of the more robust studies: students performed better when utilizing AI tools, but they found it difficult to duplicate those outcomes on their own. The gains became less pronounced. In certain instances, they vanished. It’s known as cognitive offloading, according to Fesler; the student lets the tool do the thinking, finishes the task, and leaves with less knowledge than the final product indicates. It’s a subtle issue that wouldn’t come up in a district’s quarterly data review or a company’s product demonstration. However, when a student eventually sits down to take an exam without any AI present, there are actual repercussions.
Agnew was willing to publicly express what many educators seem to feel in private: he is concerned about the implications for critical thinking, the development of fundamental skills, and how young people are developing—or not developing—a sense of intellectual independence. These are not peripheral issues. They are appearing in the research, even though it is still unfinished.
The timing is what really complicates this moment. AI tools are being used in classrooms more quickly than anyone can adequately assess them. Instructors are trying new things. Policies are being written by districts. These tools are already being used by students both inside and outside of the classroom, frequently in ways that no administrator has completely mapped out. Peer-reviewed confirmation is not what the market is waiting for. Seldom does it.
Agnew described the overall data as “mixed” and stated that it is still too early to determine whether AI is merely speeding up education or genuinely altering the nature of learning. Both could be true at the same time. The industry’s willingness to wait to find out is still up for debate.
