PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (57)

[1]
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
2025DeepSeek-AI, A. Liu et al.
[2]
CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling
2025Bichen Wang, Yixin Sun et al.
[3]
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
2025Abeer Badawi, Elahe Rahimi et al.
[4]
When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
2025Shan Chen, Mingye Gao et al.
[5]
Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence
2025Myra Cheng, Cinoo Lee et al.
[6]
Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
2025Adrián Arnaiz-Rodríguez, Miguel Baidal et al.
[7]
Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models
2025Ayoub Bouguettaya, Elizabeth M. Stuart et al.
[8]
DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization
2025Chao Zhang, Xin Shi et al.
[9]
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice
2025Ruoxi Wang, Shuyu Liu et al.
[10]
Exploring the Ethical Challenges of Conversational AI in Mental Health Care: Scoping Review
2025Mehrdad Rahsepar Meadi, Tomas Sillekens et al.
[11]
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling
2025Yahan Li, Jifan Yao et al.
[12]
Tracking Life's Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis
2025Minghao Lv, Siyuan Chen et al.
[13]
PsyDial: A Large-scale Long-term Conversational Dataset for Mental Health Support
2025Huachuan Qiu, Zhenzhong Lan
[14]
PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
2024Haojie Xie, Yirong Chen et al.
[15]
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy
2024Mian Zhang, Xianjun Yang et al.
[16]
MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
2024Wei Zhai, Nan Bai et al.
[17]
A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots
2024Lucia Chen, David A. Preece et al.
[18]
Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models
2024Yuqing Wang, Yun Zhao et al.
[19]
Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review
2024Glenn Holmes, Biya Tang et al.
[20]
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
2024Chenhao Zhang, Renhao Li et al.

Showing 20 of 57 references

Founder's Pitch

"TrustMH-Bench provides a framework to evaluate and improve the trustworthiness of mental health large language models across key dimensions."

BenchmarkingScore: 6View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

1/4 signals

2.5

Quick Build

4/4 signals

10

Series A Potential

2/4 signals

5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/3/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.