Skip to main content
Enterprise AI Analysis: Interpretable depressive symptoms screening via statistical reasoning-augmented large language models using wearable and environmental data

Healthcare AI & Clinical Decision Support

Interpretable Depressive Symptoms Screening via Statistical Reasoning-Augmented Large Language Models Using Wearable and Environmental Data

Our latest research showcases a groundbreaking hybrid AI framework for depression screening, combining the interpretability of statistical models with the power of large language models. This system leverages wearable and environmental data to offer highly accurate, personalized, and explainable insights into depressive symptom status.

Executive Impact: At a Glance

The integration of statistical reasoning with generative AI delivers unparalleled diagnostic accuracy and interpretability for mental health screening. This represents a significant leap forward in personalized, data-driven healthcare, offering actionable insights for early intervention and improved patient outcomes.

0.787 Combined Model Accuracy (Weekdays)
94.5% Weekday Specificity (OR-based FT)
0.469 Incremental Risk Reclassification (NRI)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Early and accurate assessment of depressive symptoms is crucial for effective mental healthcare. Traditional methods often rely on retrospective self-reporting, which can miss dynamic fluctuations. This study addresses these limitations by developing a hybrid clinical decision support system (CDSS) that integrates diverse data sources and advanced AI techniques.

The core innovation lies in combining interpretable logistic regression with fine-tuned large language models (LLMs). This allows for a robust classification of depressive symptom status using objective wearable-derived behavior, comprehensive clinical features, and relevant environmental exposures. The system aims to provide not just a diagnosis, but also an understanding of why a particular assessment is made, enhancing trust and clinical utility.

Depression affects approximately 280 million individuals worldwide, significantly impairing well-being, social functioning, and physical health. Early detection is a public health imperative, particularly in regions like South Korea with high suicide rates where depression is a major contributing factor. Current clinical assessments like PHQ-9 are crucial but often retrospective and episodic.

Wearable technologies offer continuous, unobtrusive monitoring, providing granular behavioral and physiological data that can enrich mental health assessments. Integrating this with environmental data (e.g., temperature, precipitation) and considering contextual factors like weekday/weekend routines is essential for accurate interpretation and predictive modeling. The challenge lies in developing flexible, interpretable models that can leverage this rich digital phenotyping data.

The study found that 13.0% of participants had elevated depressive symptoms. Individuals with higher depressive symptoms were younger, had higher body weight and resting pulse rate, and were more likely to be unmarried, less educated, unemployed, and living alone. Lifestyle factors like smoking and alcohol consumption were also more prevalent. Wearable data showed that those with higher symptoms had significantly fewer non-daylight steps and lower vigorous physical activity, especially on weekdays.

The OR-based fine-tuned LLM model significantly outperformed traditional nomograms and label-based LLM fine-tuning. It achieved 90.3% accuracy and 94.5% specificity on weekdays, and 88.8% accuracy and 96.4% specificity on weekends. Wearable-derived features provided statistically significant incremental predictive value beyond demographic and clinical variables, with larger gains on weekdays. Inference latency was low (around 3.8 seconds) with compact outputs.

The study acknowledges several limitations, including the PHQ-9 cutoff not being equivalent to a clinician-adjudicated diagnosis, the single-center cohort (Chonnam National University Hospital) with a higher-than-general-population prevalence of depression, and the reliance on GPT-4o for fine-tuning due to infrastructure constraints. The temporal alignment of PHQ-9 (14-day recall) with 7-day actigraphy data means the models provide near-concurrent screening rather than prospective prediction.

Future work should involve prospective external validation in diverse community-based cohorts to confirm generalizability. Evaluating fine-tuning across multiple LLM families and assessing the impact of alternative PHQ-9 thresholds or continuous outcome modeling are also important. Further research should explore alternative feature engineering strategies, longer monitoring durations, and rigorous evaluation of the chatbot interface with clinicians and patients to ensure functional interpretability and usability in real-world clinical workflows.

90.3% Weekday Accuracy for Depressive Symptom Screening with OR-based LLM

Enterprise Process Flow

Wearable Data, Clinical & Environmental Inputs
Multivariable Logistic Regression (ORs)
Zero-shot LLM Selection (GPT-4o)
OR-based Fine-tuning
Depression Status & Explainable Rationale

Model Performance Comparison (Weekdays)

Model Type Weekday Accuracy Key Strengths Limitations/Considerations
OR-based Fine-tuned LLM 90.3%
  • Highest accuracy & specificity
  • Preserves predictor traceability
  • Generates natural language explanations
  • Robust performance on weekdays & weekends
  • Requires careful prompt engineering
  • Not yet externally validated
Label-based Fine-tuned LLM 87.0%
  • Leverages LLM flexibility
  • Simpler fine-tuning approach
  • Lower accuracy & specificity than OR-based
  • Can suffer from majority-class collapse
  • Less transparent reasoning
Nomogram Model 71.7%
  • Visually intuitive risk estimates
  • Established statistical interpretability
  • Good for identifying significant predictors
  • Lower predictive performance than LLMs
  • Relies on predefined linear relationships
  • Less adaptable to complex data patterns

Case Study: Explainable Chatbot for Depression Screening

The OR-based fine-tuned LLM is deployed as an interactive chatbot, providing not just a classification but also a detailed explanation of its reasoning, aligning with clinical best practices.

Scenario: A patient's data indicates lower-than-average morning steps and an elevated resting pulse rate, but also engages in moderate vigorous physical activity.

Analysis: The system identifies insufficient morning steps (<3,129 steps/day) and an elevated resting pulse rate (≥90 bpm) as risk factors for higher depressive symptoms. Simultaneously, moderate vigorous physical activity (8.5-25.5 min/day) is noted as a partially protective factor. The model combines these statistical insights with pre-trained clinical knowledge to provide a nuanced assessment.

Outcome: The chatbot classifies the patient as having 'higher depressive symptoms' and provides personalized recommendations, such as 'Increase daily vigorous-intensity physical activity to consistently exceed 25 minutes' and 'Encourage increased morning steps beyond the 3129-step threshold.' This transparent approach supports informed decision-making and patient engagement.

Calculate Your Potential AI-Driven ROI

Estimate the efficiency gains and cost savings your enterprise could achieve by integrating our AI solutions, tailored to your specific operational context.

Annual Savings Estimate
Hours Reclaimed Annually

Your AI Implementation Roadmap

Our structured approach ensures a seamless integration of advanced AI into your existing workflows, delivering measurable results and a competitive edge.

Phase 01: Discovery & Strategy

In-depth analysis of your current operations, data infrastructure, and business objectives to define a tailored AI strategy and identify key opportunities for impact.

Phase 02: Solution Design & Prototyping

Development of custom AI models and system architecture, followed by rapid prototyping and iterative refinement to ensure alignment with your specific needs.

Phase 03: Integration & Deployment

Seamless integration of the AI solution into your existing technology stack, comprehensive testing, and phased deployment to minimize disruption and maximize adoption.

Phase 04: Optimization & Scaling

Continuous monitoring, performance optimization, and scalable expansion of your AI capabilities to new use cases and departments, ensuring sustained value.

Ready to Transform Your Enterprise with AI?

Partner with Own Your AI to unlock the full potential of artificial intelligence. Our experts are ready to guide you through every step of your AI journey, from strategy to sustainable impact.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking