Beyond Empathy: Why LLMs Fall Short in Mental Health Support

Nikita Silaech
Jun 25
2 min read

Updated: Jul 2

A recent Stanford study revealed that AI therapy chatbots might not be as effective as human therapists — also contributing to harmful stigma and dangerous responses.

Therapy is now a well-tested approach to help people with mental health challenges, yet a recent survey showed that nearly 50% of the population are unable to reach these therapeutic services. AI therapy chatbots powered by large language models were supposedly the ideal low-cost and accessible solutions to meet the need. But recent research from Stanford University suggests that these tools can introduce biases and failures resulting in dangerous consequences. The paper was presented at the ACM Conference on Fairness, Accountability, and Transparency.

“LLM-based systems are used as companions, confidants, and therapists, and some people see real benefits, but we find significant risks, and it’s important to lay out the more safety-critical aspects of therapy and talk about some of these fundamental differences.” said Nick Haber, an assistant professor at Stanford Graduate School of Education, affiliate of the Stanford Institute for Human-Centered AI, and senior author on the new study.

The research team at Stanford tried to compare the therapy chatbots to human therapists by reviewing key therapeutic traits such as treating patients equally, empathy, fairness, and stigma-free responses. They conducted two experiments to test five popular AI therapy bots —including “Pi,” “Noni,” and “Therapist” by Character.ai.

In the first experiment, chatbots were given test scenarios for mental health to assess their stigma level. The results showed a consistent higher stigma towards conditions like alcohol dependence and schizophrenia as compared to depression, regardless of chatbot’s size or version. “Bigger models and newer models show as much stigma as older models,” said Jared Moore, a PhD candidate in computer science at Stanford University and the lead author on the paper. “Business as usual is not good enough.”

The second experiment involved testing on how a therapy chatbot responds to mental health symptoms such as suicidal ideation or delusions in conversational setting. The results showed that chatbots failed to respond safely to these prompts. For example, bots like “Noni” when prompted with a veiled suicidal questions provided dangerous, enabling responses instead of flagging or reframing the intent.

Researchers believe that despite risks, AI could still play a supportive role in therapy, such as handling administrative tasks, training therapists, or assisting with safe self-reflection tools. However, these types of human problems still require human touch. Therapy isn’t necessarily about solving clinical problems but mending and building human relationships.

“If we have a [therapeutic] relationship with AI systems, it’s not clear to me that we’re moving toward the same end goal of mending human relationships,” Moore said. The need to critically evaluate and define AI’s role in therapy is dire, focusing on augmenting—not replacing–humans.

Responsible AI Foundation

Beyond Empathy: Why LLMs Fall Short in Mental Health Support

Related Posts

Comments