Artificial Intelligence in Diagnostics and Patient Care

Imagine visiting a therapist who never sleeps, never judges you, and is always available at 3 AM when anxiety takes hold. Or picture a clinical alert system in an ICU that not only flags a deteriorating patient but also explains why it’s worried. Now think about a powerful language model that can sift through symptoms and generate a list of possible diagnoses within seconds.

This is no longer science fiction. Artificial intelligence (AI) is rapidly weaving itself into the fabric of modern medicine — and the results are both exciting and sobering.

Recent research published in leading medical informatics journals sheds light on how AI is being used — and how it should be used — across three major areas: mental health therapy, clinical decision support in critical care, and medical diagnosis through large language models (LLMs). Each study offers a window into the possibilities, the pitfalls, and the patient-centered principles that must guide AI’s future in healthcare.

Let’s break it all down in plain language. Whether you’re a patient, a caregiver, a clinician, or simply a curious reader, this article is for you.


1. The Mental Health Crisis and AI’s Growing Role in Depression Treatment

Depression is one of the most widespread mental health conditions in the world. Yet access to trained mental health professionals remains deeply inadequate. Long waiting lists, high therapy costs, geographic barriers, and the persistent stigma around seeking help all prevent millions from getting the care they desperately need.

This treatment gap has sparked growing interest in digital mental health tools — and AI-powered psychotherapists are now a real possibility on the horizon. But before developers build these tools, a critical question must be answered: What do patients actually want from an AI therapist?

A 2026 qualitative study published in JMIR Formative Research tackled this question head-on. Researchers from City University of Hong Kong and Tongji University surveyed 452 individuals using Amazon Mechanical Turk. Participants answered three open-ended questions about AI’s potential role in treating depression. The research team used a grounded theory approach and established strong intercoder reliability (Cohen κ = 0.80) to ensure rigorous analysis.[1]

The findings were rich, nuanced, and deeply human.

Participants imagined five core roles for an AI psychotherapist:

  • Diagnosis — helping identify symptoms and severity
  • Treatment — delivering evidence-based therapeutic interventions
  • Consultation — offering guidance and information
  • Self-management — supporting daily coping and mood tracking
  • Companionship — providing emotional presence and reducing loneliness

These scenarios reflect a deep longing — not just for clinical help, but for genuine emotional connection. People with depression don’t only want to be diagnosed. They want to be heard, understood, and accompanied through their healing journey.

This finding has profound implications for how AI mental health tools should be designed. A chatbot that only outputs diagnosis codes won’t cut it. Patients are looking for something far more human — even from a machine.

2. What Patients Want From an AI Psychotherapist: Key Features and Design Priorities

So what exactly should an AI psychotherapist look like? The same study from City University of Hong Kong produced a detailed list of desired features — ranked and contextualized based on who’s asking.

Participants identified 11 key features they wanted in an AI mental health tool:[1]

  • Professionalism — clinical accuracy and evidence-based approaches
  • Warmth — a tone that feels caring, not robotic
  • Precision care — tailored responses based on individual needs
  • Empathy — the ability to acknowledge and validate feelings
  • Remote services — accessible from anywhere, at any time
  • Active listening — genuinely processing what users share
  • Personalization — adapting to a person’s history, preferences, and progress
  • Flexible treatment options — offering multiple therapeutic modalities
  • Patience — never rushing or dismissing a user
  • Trustworthiness — behaving consistently and safely
  • Basic treatment alternative — serving as a stepping stone when no human therapist is available

Critically, these preferences weren’t uniform across all users. The research revealed important differences based on participant profiles:

  • People with higher social stigma around depression placed much greater emphasis on privacy protection.
  • Those with more severe depression prioritized precision care and timely access.
  • Users with low trust in AI were less interested in remote services — they wanted more control and perhaps more human oversight.
  • Privacy-conscious individuals showed reduced preference for features that required extensive data disclosure.

This is a game-changer for design. It means one-size-fits-all AI mental health tools will inevitably fail a large portion of users. Instead, developers must build systems that adapt to the unique emotional, clinical, and privacy needs of each individual.

The researchers also proposed a MoSCoW prioritization framework — categorizing features as “must have,” “should have,” “could have,” and “won’t have” — as a practical starting point for future AI system development.[1] This type of structured, user-informed design methodology could make AI mental health tools dramatically more effective and trustworthy.

3. The Real Risks: What Patients Fear About AI in Mental Health Care

For all the enthusiasm about AI psychotherapists, patients aren’t blindly optimistic. The same 2026 study surfaced six major concerns that developers and policymakers simply cannot ignore.[1]

  • Diagnostic inaccuracy — What if the AI gets it wrong and misidentifies a condition?
  • Treatment errors — What if an AI recommends the wrong intervention and worsens symptoms?
  • Privacy breaches — Mental health data is extremely sensitive. Who has access to it?
  • Lack of human interaction — Can a machine truly replace the healing power of a human connection?
  • Technical malfunctions — What happens if the system crashes during a mental health crisis?
  • Lack of emotional engagement — What if the AI feels cold, scripted, or disconnected?

These concerns aren’t irrational fears. They reflect a sophisticated understanding of where AI still falls short. Depression, in particular, is a deeply personal, often life-threatening condition. The stakes of getting it wrong are incredibly high.

The fact that users are already thinking critically about these risks is actually a good sign. It means the public is engaged and informed — and that they want to be partners in shaping how AI tools are built and deployed, not passive recipients of whatever technology companies decide to release.

For healthcare providers and developers, the message is clear: transparency, safety, and emotional intelligence are non-negotiable in AI mental health tools.

4. Explainable AI in Critical Care: Why Clinicians Need to Understand the “Why”

Shift the scene from therapy apps to the intensive care unit. Here, AI-driven clinical decision support (CDS) tools are being used to monitor patients, detect deterioration, and flag life-threatening changes before they spiral out of control.

These tools can be literal lifesavers. But there’s a problem: many AI systems operate as “black boxes.” They produce alerts without explaining their reasoning. And when a nurse or doctor can’t understand why an alert was triggered, they’re less likely to trust it — or act on it.

This challenge led researchers at Australia’s Commonwealth Scientific and Industrial Research Organisation (CSIRO) to investigate how clinicians feel about “explainable AI” (XAI) in critical care settings. Their 2026 qualitative study, published in JMIR Human Factors, involved semistructured interviews with 14 clinical experts and scenario-based exercises.[2]

The findings were nuanced and practically important.

Clinicians valued explanations — but not in all situations equally. They were most useful when:

  • Situations were complex or unfamiliar
  • Explanations were clear, plausible, and actionable
  • The AI’s reasoning aligned with their own clinical judgment

Interestingly, the study also uncovered something called the “disagreement problem” — a situation where different XAI methods produce conflicting explanations for the same alert. Alarmingly, this discrepancy has the potential to undermine trust and lead to poor clinical decisions.

However, the researchers found that clinicians were less troubled by this disagreement than expected — as long as the AI’s actual predictive alerts were accurate. In other words, performance trumps explanation in the clinical world, especially among experienced users who have built trust in a system over time.[2]

This suggests that explainability plays a more critical role at the beginning of AI adoption — when clinicians are still learning to trust a new tool — than after they’ve had significant experience with it.

5. Building Trust Between Clinicians and AI: Lessons From the ICU

Trust is the currency of healthcare. Patients trust doctors. Doctors trust each other. And now, clinicians are being asked to trust machines.

The CSIRO study on explainable AI offers a roadmap for how that trust can be built and sustained.[2] The researchers found that trust in AI-driven CDS tools depended on more than just explanation quality. Several factors mattered:

  • System accuracy — Does the AI perform well in real clinical situations?
  • Alignment with clinical reasoning — Does the AI “think” like an experienced clinician?
  • Workflow integration — Does the tool fit naturally into existing clinical processes, or does it create extra friction?
  • Perceived reliability — Is the system consistent and dependable over time?

These insights carry a powerful design implication: you can’t just bolt an explanation feature onto a poorly performing AI and expect clinicians to trust it. The entire system — its accuracy, its integration, its usability — must be carefully engineered around the needs and workflows of its users.

The study also highlights a trajectory of trust. Initially, clinicians rely heavily on explanations to make sense of AI outputs. Over time, as they develop familiarity and expertise, their need for explanations decreases and their focus shifts to accuracy and reliability. This means AI tools must be designed to serve both novice and expert users simultaneously — offering rich explanations for beginners while remaining streamlined and efficient for experienced clinicians.[2]

For healthcare systems looking to adopt AI-driven decision support, this research delivers a clear mandate: invest in implementation, training, and iterative validation — not just in algorithm development.

6. Large Language Models in Medical Diagnosis: Promise, Performance, and Pitfalls

Now let’s talk about one of the most talked-about developments in AI medicine: large language models (LLMs) as diagnostic tools.

The release of powerful open-source models like DeepSeek-R1 has created enormous excitement in the healthcare sector. Hospitals are exploring whether these models can assist clinicians in generating differential diagnoses — lists of possible conditions based on a patient’s symptoms.

A rigorous 2026 comparative study published in Journal of Medical Systems by researchers from Beijing Obstetrics and Gynecology Hospital and other institutions put five versions of DeepSeek-R1 to the test.[3]

The setup was methodical. Each model was tested on 110 simulated clinical cases drawn from publicly available data. The cases covered internal medicine, surgery, neurology, gynecology, and pediatrics — and were categorized by disease frequency (frequent, less frequent, rare). Each model was asked to generate five preliminary diagnoses, and a response was scored as correct if the accurate diagnosis appeared anywhere in those five.

The results were striking — and cautionary.

  • DeepSeek-R1-671B (the full, non-distilled model) significantly outperformed its base model, DeepSeek-V3 (95.45% vs. 88.18%; p = 0.008). This is impressive and suggests real potential for clinical diagnostics.[3]
  • DeepSeek-R1-8B (a distilled smaller model) actually underperformed its base model, Llama3.1-8B (47.27% vs. 64.54%; p = 0.003) — a significant and concerning gap.
  • The mid-sized distilled models (14B, 32B, 70B) showed no significant difference from their base models.

In other words, bigger isn’t always better — but smaller distilled models can be significantly worse. The act of compressing a large model into a smaller, more deployable version appears to strip away critical diagnostic reasoning capabilities.

The researchers also conducted a qualitative analysis of the models’ “chain-of-thought” outputs in incorrect cases. They identified three error patterns that appeared consistently across distilled models:[3]

  • Reasoning drift — The model starts reasoning correctly but gradually drifts toward an incorrect conclusion.
  • Red-flag recognition failure — The model fails to identify critical warning signs that would point to serious diagnoses.
  • Diagnostic priority inversion — The model ranks less likely diagnoses above more probable ones.

These aren’t minor glitches. In a real clinical setting, failing to recognize a red flag — say, symptoms of a rare but life-threatening condition — could have devastating consequences.

The study’s conclusion is unambiguous: distilled models should not be deployed for text-based diagnostic tasks without further validation on real patient data. The results simply don’t support it yet.[3]


Conclusion: AI in Healthcare — Full of Promise, Demanding of Caution

The three studies reviewed in this article paint a consistent picture: AI holds tremendous promise in healthcare, but that promise must be pursued with rigor, humility, and a relentless focus on the people at the center of it all — patients and clinicians.

Here are the key takeaways:

  • AI psychotherapists could meaningfully expand access to mental health care — but only if they are designed with deep attention to user needs, privacy, emotional intelligence, and the diversity of people living with depression.[1]
  • Explainable AI in critical care builds trust — but explanation alone isn’t enough. Accuracy, workflow integration, and alignment with clinical reasoning are equally important — and trust evolves over time.[2]
  • Large language models show diagnostic potential — but smaller, distilled versions carry real risks. The full DeepSeek-R1-671B model demonstrated high accuracy, while distilled models failed in concerning ways, particularly around rare conditions and red-flag recognition.[3]

As we move further into the AI era of medicine, one principle must anchor every innovation: technology should serve human beings — not the other way around.

That means listening to patients about what they need and fear. It means designing tools that clinicians can actually use and trust. And it means validating AI models rigorously before placing them in situations where mistakes cost lives.

AI won’t replace compassionate, skilled healthcare professionals anytime soon. But used wisely, it could amplify their impact in ways that change medicine — and millions of lives — for the better.

The future of AI in diagnostics and patient care is being written right now. Let’s make sure it’s a story worth telling.


References

  1. Xian C, Yan A, Wang Y, Tsang EYH, Huang L, Xu DJ. Applicable Scenarios, Desired Features, and Risks of AI Psychotherapists in Depression Treatment From the Patient’s Perspective: Exploratory Qualitative Study. JMIR Formative Research. 2026 May 1;10:e85138. doi: 10.2196/85138. PMID: 42066293.
  2. Rahman J, Delaforce A, Bradford D, Li J, Magrabi F, Cook D, Brankovic A. The Role of Explanations in AI-Generated Alerts: Qualitative Study of Clinical Views on Explainable AI in Predictive Tools. JMIR Human Factors. 2026 May 1;13:e81460. doi: 10.2196/81460. PMID: 42066251.
  3. Zhong W, Fu Y, Peng D, Liu Y, Liu Y, Yang K, Gao H, Yan H, Hao W, Yan Y, Yin C. Open-Source Large Language Models Distilled DeepSeek-R1 Pose Challenges for On-Premises Clinical Deployment in Medical Diagnosis: A Comparative Study of Performance. Journal of Medical Systems. 2026 May 1;50(1):68. doi: 10.1007/s10916-026-02390-5. PMID: 42062641.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top