A recent study from MIT’s Center for Constructive Communication has a subtly unsettling quality that has nothing to do with robots taking over. In some ways, it’s more uncomfortable and smaller than that. In order to test GPT-4, Claude 3 Opus, and Llama 3 against thousands of questions, researchers attached brief biographical sketches to each question, characterizing the asker as less educated, non-native English speakers, or foreign nationals. The chatbots became worse. Not a little worse. In certain instances, significantly worse.
The lead author of the paper, Elinor Poole-Dayan, stated that the team’s goal was to verify a promising possibility: that AI could eventually close gaps in who has access to trustworthy information. Rather, they discovered the opposite pattern concealed beneath the advertising. Not only did the models provide less accurate responses for these users, but they also declined to respond at all far more frequently, and when they did, it was sometimes in a tone that could only be described as patronizing.
According to the study, Claude 3 Opus responded to less educated users with derogatory or condescending language almost 44% of the time, compared to less than 1% for highly educated users. In a few instances, it even mimicked the questioner’s broken English.

Even though children were not directly tested in the study, it is difficult to avoid thinking about them. Consider who truly fits the profile that the researchers identified: a tendency to ask questions incorrectly, a limited vocabulary, improving English proficiency, and less formal education. That accurately sums up how many kids are currently using chatbots to assist with their homework. It makes sense to question whether a ten-year-old inquiring about photosynthesis is receiving the same subtly diminished version of the truth if a system has already been shown to perform poorly for adults who possess those characteristics.
This is especially odd because, in many instances, the model was obviously aware of the correct response. The researchers discovered that Claude answered the same question correctly for everyone else but withheld some information, particularly for less educated users from Iran or Russia, on subjects like nuclear power or anatomy. That isn’t ignorance. Determining who gets a hedge and who deserves the whole truth is more akin to a judgment call built into the system.
The paper’s co-author, Jad Kabbara, has noted that these effects compound. The user with the steepest decline in accuracy of all those tested is a non-native speaker with less formal education. It is similar to human behavior that social scientists have long documented: native English speakers frequently misjudge non-native speakers as less competent, regardless of their true knowledge. It seems that the prejudice was not limited to individuals. It appeared in the machines designed to sound neutral after being absorbed into the training data.
All of this does not imply that AI chatbots are ineffective or that their manufacturers are acting dishonestly. However, there is a disconnect between what is truly going on behind the scenes and the pitch, which calls for universal access to knowledge. The center’s director, Deb Roy, presented it as a reminder that prejudice can creep into these systems covertly and harm people without anyone realizing it. Silently, that word seems appropriate. This was not intended to occur. Millions of people, including many children, are now subject to a system that may be providing them with slightly less than everyone else.
