Asking a chatbot to solve a problem that Plato penned in 385 BCE has a subtle peculiarity. Not in a dramatic sense, but rather in a way that causes you to stop in the middle of your thoughts and wonder what precisely is going on inside these systems that millions of people use on a daily basis.
That’s basically what two Cambridge University researchers did. The “doubling the square” problem, one of philosophy’s earliest teaching tools, was fed into ChatGPT by Andreas Stylianides, a professor of mathematics education at Cambridge, and Dr. Nadav Marco, a visiting scholar from the Hebrew University of Jerusalem. According to Plato, Socrates asks an uneducated boy to double the area of a square while guiding him through the puzzle. Initially, the boy wants to double each side. Incorrect. The diagonal of the original square should be used as the new side length. The answer is not immediately apparent, but once you see it, it becomes almost elegant. This led to centuries of discussion about whether or not mathematical knowledge is innate and can only be discovered via experience.
This specific puzzle was carefully selected by the researchers. They reasoned that there was little chance the precise solution would fit neatly in ChatGPT’s training data because it is trained almost exclusively on text rather than diagrams or geometric imagery. Therefore, it would be significant if the AI found the correct answer on its own. It would imply that mathematical reasoning is not hardwired but rather something that can be learned.
Compared to a straightforward pass or fail, what transpired was more intricate and fascinating. ChatGPT made no attempt to use the traditional diagonal solution when Marco and Stylianides mimicked Socrates’ style of questioning. Rather, it fell back on algebra, a method that would have been entirely alien to ancient Athens. Even when the researchers expressed frustration, it steadfastly maintained its algebraic position in the face of cues that pushed it toward geometry. The chatbot only came up with the geometric solution after they expressed their disappointment that, despite its purported expertise, it was unable to provide a “elegant and exact” response. It’s possible that framing has a significant impact on these systems in ways that even their developers are still unsure of.

The more illuminating moment then arrived. ChatGPT erred when asked to double a rectangle’s area while maintaining its proportions. It asserted that there was no geometric solution because a rectangle’s diagonal could not be used to double its size. The first part is accurate. A different geometric approach does work, but the second part isn’t. The likelihood of this particular false claim being extracted from training data, according to Marco, was “vanishingly small.” Instead of using stored knowledge, the chatbot seemed to be improvising, building on presumptions from their previous discussion about squares.
It’s difficult to ignore the parallel. Almost the same kind of mistake was made by Socrates’ pupil: taking something partially correct and going too far with it. The researchers took care not to interpret this too broadly. They do not assert that ChatGPT has human-like thought processes. However, what they saw from their side of the screen appeared “learner-like,” as Marco puts it.
The zone of proximal development, which is the gap between what a person already knows and what they could discover with the correct guidance, is an educational concept that they have used to frame this behavior. According to their theory, AI may have a functional equivalent in the form of problems that it is unable to solve right away but may be able to solve with the correct encouragement. According to Stylianides, “understanding and evaluating AI-generated proofs are emerging as key skills that need to be embedded in the mathematics curriculum,” which is a more pressing issue than it might initially seem. This changes how we might think about using these tools, especially in classrooms. Students will encounter difficulties if they use ChatGPT as an answering machine. It’s a completely different relationship when students view it as a thinking partner they can challenge, test, and question. Prompts such as “I want us to explore this problem together” are recommended by the researchers instead of “tell me the answer.” That’s an important distinction.
The fact that the experiment doesn’t settle and most likely wasn’t attempting to raises a more general question. The so-called “black box problem” in AI, which is the inability to truly see how these systems arrive at their conclusions, is still present today. Inputs and outputs are observed. Inference encompasses everything in between. In a way, Cambridge held a flashlight up to that gap using a 2,400-year-old puzzle. The light is not very far away. However, the effort itself has some value.
