During testing week, a certain type of silence permeates a school gymnasium. Desk rows, tapping pencils, a proctor walking the aisles with a clipboard. In fifty years, not much has changed about that image. What happens to the paper when the pencil stops moving is what has changed, almost imperceptibly.
The essay your child wrote last spring was not read by a teacher in an increasing number of states. A machine read it. An algorithm examined the sentence structure, identified specific phrases, compared the text to thousands of other samples, and produced a score—often in a matter of seconds, sometimes even before a human saw it.
One of the more obvious examples is New Jersey. The head of the state teachers union publicly expressed concern about a child failing a computer-graded test only to learn, much later, that something had simply gone wrong with the software due to the state’s new standardized exams, which will heavily rely on AI scoring for student writing. It’s a legitimate concern. An algorithm cannot be appealed to in a tidy manner.

It’s not exactly new technology. Testing behemoths like ETS have been quietly using automated essay scoring since the late 1990s. The scale and the quiet surrounding it have changed. Frequently, parents are unaware that a model trained on patterns rather than meaning is evaluating their child’s writing. That discrepancy between what is revealed and what is actually occurring is unsettling.
It is clear that states and vendors will find it appealing. Two teachers may read the same essay and assign different grades because human grading is slow, costly, and inconsistent. AI promises consistency and speed. The company behind a number of state assessment platforms, Cambium Learning Group, can return results in less than a day as opposed to weeks. That’s tempting for districts that are overburdened with spreadsheets.
However, uniformity and fairness are not the same. These models inherit any bias ingrained in the training data because they are trained on previously graded essays. Researchers at Brookings have identified this exact issue: the algorithm may end up rewarding a type of polish that has little to do with real thinking if the writing samples used to train the system skew toward students with more resources. A machine searching for recognizable patterns might interpret a witty, nontraditional student voice as noise.
Another unfamiliar outcome that is subtly appearing in classrooms is children learning to write for algorithms rather than readers. Instructors have noticed that some phrases appear repeatedly, almost like a software-specific dialect. When the audience is a scoring model that has been trained to identify patterns rather than a real person, it’s difficult not to wonder what that does to actual writing instruction over time.
To be fair, this isn’t the SAT being secretly administered by a chatbot; the College Board has made it clear that its scoring is still rule-based and statistically equivalent, with AI primarily being used for fraud detection rather than judgment. The distinction between “automated” and “AI-judged” varies greatly between programs, frequently with little public explanation, and state assessments are a different animal.
It appears that this change occurred more quickly than the discourse surrounding it. While a server is processing thousands of essays per hour somewhere, parents continue to imagine a teacher holding a red pen. Perhaps technology will advance to the point where it merits that confidence. For the time being, many educators and a sizable portion of parents appear to be asking the same fundamental question: who is grading the grader?
