Enterprise AI Analysis

Leveraging Large Language Models for Explainable Activity Recognition in Smart Homes: A Critical Evaluation

Explainable Artificial Intelligence (XAI) aims to uncover the inner reasoning of machine learning models. In IoT systems, XAI improves the transparency of models processing sensor data from multiple heterogeneous devices, ensuring end-users understand and trust their outputs. Among the many applications, XAI has also been applied to sensor-based Activities of Daily Living (ADL) recognition in smart homes. Existing approaches highlight which sensor events are most important for each predicted activity, using simple rules to convert these events into natural language explanations for non-expert users. However, these methods produce rigid explanations lacking natural language flexibility and are not scalable. With the recent rise of Large Language Models (LLMs), it is worth exploring whether they can enhance explanation generation, considering their proven knowledge of human activities. This paper investigates potential approaches to combine XAI and LLMs for sensor-based ADL recognition. We evaluate if LLMs can be used: a) as explainable zero-shot ADL recognition models, avoiding costly labeled data collection, and b) to automate the generation of explanations for existing data-driven XAI approaches when training data is available and the goal is higher recognition rates. Our critical evaluation provides insights into the benefits and challenges of using LLMs for explainable ADL recognition.

Schedule Your Strategy Session

Executive Summary: LLMs for Explainable HAR in Smart Homes

This research explores the integration of Large Language Models (LLMs) with Explainable Artificial Intelligence (XAI) for Human Activity Recognition (HAR) in smart homes. The authors propose and evaluate two novel LLM-based methods: LLMe2e for zero-shot ADL recognition with explanations, and LLMExplainer for generating natural language explanations from data-driven XAR models. Key findings indicate that LLMe2e achieves reasonable recognition rates without training data and provides appreciated explanations, while LLMExplainer significantly enhances explanation quality for existing XAR systems. However, the study also critically evaluates drawbacks such as over-reliance, hallucinations, limitations in PIR-dominated environments, and significant financial, privacy, and scalability concerns associated with LLM deployment.

Key Takeaways:

LLMs can effectively generate human-readable explanations for HAR.
Zero-shot ADL recognition (LLMe2e) offers acceptable performance without labeled data.
LLMExplainer improves explanation quality for data-driven XAR models.
Over-reliance on LLM-generated explanations is a significant risk due to plausibility without factual accuracy.
PIR-sensor dominated environments are challenging for LLM-based HAR.
Deployment challenges include cost, privacy, and scalability of large LLMs.

0.80 F1 Weighted Avg. (MARBLE LLMe2e)

0.77 F1 Weighted Avg. (UCI ADL Home B LLMe2e)

200+ Survey Participants

-6% DeXAR vs LLMe2e F1 (MARBLE)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM-based Methods for Explainable ADL Recognition

This section presents two novel approaches exploring how LLMs can be adopted for XAI in sensor-based ADL recognition. LLMe2e, a zero-shot method, directly uses an LLM for both ADL classification and natural language explanation generation. LLMExplainer is an LLM-based approach to generate natural language explanations from the most important events derived by data-driven XAR methods, being agnostic to the underlying XAR model.

LLMe2e adopts an end-to-end approach, converting raw sensor data into a structured JSON representation (Fig. 2) and using a single LLM prompt with 'role prompting' and 'Chain of Thought' strategies (Fig. 3, 4). This method requires no training data, leveraging LLMs' intrinsic knowledge. LLMExplainer (Fig. 6) takes the predicted activity and 'most important features' (also in JSON format) from any data-driven XAR classifier (e.g., DeXAR output in Fig. 10, converted to Fig. 11) to generate user-friendly explanations (Fig. 9).

Key Findings:

LLMe2e achieves acceptable recognition rates (F1-score of 0.80 on MARBLE and 0.77 on UCI ADL Home B) without requiring any training data.
LLMe2e is slightly less accurate than DeXAR (supervised, 6% higher F1 on MARBLE) but shows comparable results on some UCI ADL activities.
LLM-based approaches (LLMe2e and LLMExplainer) offer explanations that users appreciate more than state-of-the-art heuristic methods, with LLMExplainer being the best.
LLMExplainer generates more convincing wording and includes possible relationships between sensor states and activities, even when leveraging the same relevant sensor data as DeXAR.
LLMe2e can capture activities poorly represented in training data (e.g., Snacking in UCI ADL Home A) due to its zero-shot nature.

Enterprise Application Areas:

Zero-shot Human Activity Recognition (HAR) in smart homes without labeled data.
Automated generation of flexible and nuanced natural language explanations for XAI models.
Enhancing user trust and understanding of AI decisions in pervasive computing.
High-impact healthcare applications for early detection and continuous monitoring of cognitive decline.

LLMe2e Process Flow for Zero-Shot Explainable ADL Recognition

Smart Home Sensor Data Stream

→

Sensor States Generation

→

Segmentation

→

States2JSON

→

User Prompt Generation

→

Large Language Model

→

Activity Label and Explanation Extractor

→

Most Likely Activity & Explanation

Drawbacks and Risks of Using LLMs for Explainable AI

The paper critically evaluates potential issues with LLM adoption for explainable HAR, including over-reliance on explanations, hallucinations, limitations in PIR-dominated environments, and significant financial, privacy, and scalability concerns. It also proposes mitigation strategies for these challenges.

Over-reliance is a key risk as LLMs can produce linguistically plausible but factually inaccurate explanations, leading to undue user trust, particularly in misclassification cases (Table 5). Hallucinations may result in incorrect classifications or speculative reasoning in explanations. LLM-based methods struggle with PIR-sensor dominated environments due to limited semantic information (Fig. 16). Financial costs for continuous cloud LLM usage (e.g., $230/day for GPT-40) are substantial. Privacy issues arise from transmitting sensitive personal data to third-party LLM providers. Scalability is challenged by API limits, latency, and hardware demands for open-weight LLMs in large deployments.

Key Findings:

Over-reliance: LLMs can generate convincing but inaccurate explanations for wrong predictions, leading to excessive user trust.
Hallucinations: LLMs may invent correlations or introduce speculative reasoning, creating linguistically coherent but not factually accurate explanations.
PIR Sensor Limitations: LLMe2e struggles with low-semantic data from PIR-only sensors, making it unsuitable for such environments.
Financial Cost: Continuous querying of cloud-based LLMs like GPT-40 is very expensive (approx. $0.0085/window, $230/day).
Privacy Issues: Outsourcing sensitive personal activity data to third-party LLM services exposes users to potential privacy risks.
Scalability: Third-party LLM services have usage limits, and local open-weights deployments require powerful, costly, and energy-intensive servers.

Enterprise Application Areas:

Risk management frameworks for AI deployment in sensitive domains.
Design of privacy-preserving AI architectures for smart homes.
Cost-benefit analysis for LLM integration in HAR systems.
Development of robust validation strategies for LLM-generated explanations.

LLM Risks and Mitigation Strategies for Explainable HAR

Risk Category	Description/Impact	Proposed Mitigation Strategies
Over-reliance	LLMs can generate plausible but inaccurate explanations, fostering excessive user trust, especially for misclassifications.	Integrating prompt instructions to avoid inventing non-existent correlations. Isolating/removing windows during activity transitions to reduce inconsistencies. Using an agent-based approach to verify rationale against data.
Hallucinations	LLMs may invent correlations between sensor data or introduce speculative reasoning, creating linguistically coherent but not factually accurate explanations.	Refining system prompts to guide the model to stick to facts. Passing XAR model input (window of sensor events) to LLMExplainer for factual grounding. Combining LLM-based prediction with XAR classifier output.
PIR Sensor Limitation	LLM-based methods struggle with low-semantic data from PIR-only sensors, as common-sense knowledge is insufficient to infer ADLs.	Focus LLM deployment on environments with rich, high-semantic sensor data (e.g., magnetic, pressure, inertial sensors). Not recommended to use LLMs for PIR-dominated smart home setups.
Financial Cost	Continuous querying of cloud-based LLMs like GPT-40 is very expensive (approx. $230/day for 16s windows with 80% overlap).	Explore more lightweight/quantized open-source LLMs deployable on smart home gateways/edge devices. Optimize costs in telemedicine platforms by sharing LLM resources across subjects.
Privacy Issues	Transmitting sensitive personal data (human activities) to untrusted third-party LLM providers exposes users to potential privacy risks.	Deploying open-weights LLMs on private machines of trusted entities in a trusted domain. Implementing state-of-the-art security methods (encryption on-transit and at-rest). Careful configuration of security settings and sandboxing LLMs to prevent malicious outputs.
Scalability	API rate limits, latency, and hardware requirements for large-scale deployments with numerous sensing devices and subjects.	Implementing parallel API calls with exponential backoff, concurrency control, and request queuing. Running multiple LLM instances across machines with GPUs for high availability and throughput. Distributing lightweight language models to edge devices (e.g., smart home gateways).

Experimental Evaluation and User Perceptions

The paper conducts extensive experiments on two public smart home datasets (UCI ADL and MARBLE) to evaluate the recognition rate of LLMe2e against baselines (DeXAR, ADL-LLM) and the quality of explanations from LLMe2e and LLMExplainer through user surveys.

The evaluation uses weighted F1-scores for recognition rate and user surveys (247 participants from Amazon Mechanical Turk) for explanation quality, rated on a Likert scale. LLMe2e, a zero-shot model, is compared to DeXAR (supervised, using 70% training data) and ADL-LLM (zero-shot, sentence-based input). LLMExplainer's explanations are generated from DeXAR's important features. Results show LLMe2e's competitive recognition rates without training and higher user appreciation for LLM-generated explanations compared to heuristic ones.

Key Findings:

LLMe2e achieves a F1-score of 0.80 on MARBLE and 0.77 on UCI ADL Home B, demonstrating acceptable zero-shot recognition rates.
DeXAR (supervised) is 6% more accurate than LLMe2e on MARBLE, but LLMe2e's performance is comparable on UCI ADL.
LLM-generated explanations (LLMe2e and LLMExplainer) are significantly more appreciated by users than DeXAR's heuristic explanations.
LLMExplainer, even using the same input features as DeXAR, provides more convincing and detailed explanations.
LLMe2e sometimes misclassifies similar activities (e.g., 'setting up table' as 'clearing table') due to over-reliance on common-sense similarity.
LLMe2e can identify activities poorly represented in training data (e.g., 'Snacking' in UCI ADL Home A) due to its zero-shot nature.

Enterprise Application Areas:

Benchmarking zero-shot HAR models against supervised baselines.
Quantitative and qualitative evaluation of XAI methods.
Understanding user perception and trust in AI explanations.
Guiding the development of more human-centered AI systems in smart homes.

Case Study: Explaining 'Phone Call' Activity with LLMe2e

This case study, derived from Figure 5, illustrates LLMe2e's ability to process semantic sensor events and generate a coherent, natural language explanation for the 'phone call' activity. It highlights the step-by-step reasoning (Chain of Thought) and integration of temporal and contextual correlations.

Example Scenario:

Time window: 11:21:23 - 11:21:39

Detected States:

Dynamic hand movements: 11:21:31 - 11:21:36
Incoming phone call: 11:21:23 - 11:21:36
Presence in the medicines area: 11:21:23 - 11:21:29, 11:21:35 - 11:21:39
Presence in the hall: 11:21:28 - 11:21:36

LLMe2e Reasoning: 1. Dynamic hand movements detected (11:21:31-11:21:36). 2. Incoming phone call detected (11:21:23-11:21:36). 3. Presence in medicines area (11:21:23-11:21:29, 11:21:35-11:21:39) and hall (11:21:28-11:21:36) indicates movement. 4. Temporal and contextual correlation suggests answering a phone call while moving due to the coincidence of these events.

LLMe2e Explanation: I predicted the activity phone call mainly because the subject received a phone call and was moving around during the call, which is typical behavior for someone engaged in a phone conversation.

Key Insights:

LLMe2e effectively processes multiple concurrent sensor states and their time intervals.
It applies Chain of Thought reasoning to infer correlations and contextual information.
The explanation is framed in user-friendly language, avoiding technical sensor details.
Demonstrates LLMe2e's ability to generate relevant and understandable explanations for complex activities.

Quantify Your AI Efficiency Gains

Estimate the potential annual cost savings and hours reclaimed by integrating advanced AI solutions for activity recognition and explanation in your enterprise.

Your Industry

Number of Employees (impacted by manual processes)

Avg. Hours/Week on Manual Activity Recognition/Reporting

Average Hourly Cost per Employee ($)

Estimated Annual Savings $0

Hours Reclaimed Annually 0

Calculate Your ROI

Your Strategic AI Implementation Roadmap

A phased approach to integrating explainable LLMs for robust and ethical smart home activity recognition.

Phase 1: Discovery & Strategy Alignment

We begin with a comprehensive audit of your current smart home infrastructure and operational workflows. This phase focuses on identifying high-impact ADL recognition opportunities and defining clear, measurable objectives for AI integration. We'll align on target activities, desired explanation depth, and key performance indicators.

Phase 2: Data Integration & Model Prototyping

In this phase, we establish secure data pipelines from your diverse sensor ecosystem. We'll then prototype LLMe2e for zero-shot recognition or integrate LLMExplainer with existing XAR models, focusing on your prioritized use cases. Initial explanation fidelity and recognition accuracy will be benchmarked.

Phase 3: Customization & User Validation

Here, we fine-tune LLM prompts, incorporate user-specific context (e.g., habitual routines, cultural norms), and iterate on explanation generation to meet specific user needs. User surveys and feedback loops are critical to ensure explanations are trusted, understandable, and actionable for non-expert users, mitigating over-reliance risks.

Phase 4: Scalable Deployment & Continuous Optimization

The solution is deployed into your production environment, ensuring robust performance, privacy compliance, and cost efficiency. We implement monitoring for hallucinations and over-reliance, along with strategies for long-term adaptation and concept drift detection, ensuring the system remains aligned with evolving user behaviors and home configurations.

Schedule Your Strategic AI Consultation

Ready to Transform Your Enterprise with Explainable AI in Smart Homes?

Our experts are ready to guide you through the complexities of integrating LLMs for advanced Human Activity Recognition and explanation. Book a session to discuss your unique challenges and how our tailored solutions can drive significant operational efficiency and user trust.

Book Your Consultation Now

Enterprise AI Analysis

Leveraging Large Language Models for Explainable Activity Recognition in Smart Homes: A Critical Evaluation

Executive Summary: LLMs for Explainable HAR in Smart Homes

Deep Analysis & Enterprise Applications

LLM-based Methods for Explainable ADL Recognition

Key Findings:

Enterprise Application Areas:

LLMe2e Process Flow for Zero-Shot Explainable ADL Recognition

Drawbacks and Risks of Using LLMs for Explainable AI

Key Findings:

Enterprise Application Areas:

LLM Risks and Mitigation Strategies for Explainable HAR

Experimental Evaluation and User Perceptions

Key Findings:

Enterprise Application Areas:

Case Study: Explaining 'Phone Call' Activity with LLMe2e

Quantify Your AI Efficiency Gains

Your Strategic AI Implementation Roadmap

Phase 1: Discovery & Strategy Alignment

Phase 2: Data Integration & Model Prototyping

Phase 3: Customization & User Validation

Phase 4: Scalable Deployment & Continuous Optimization

Ready to Transform Your Enterprise with Explainable AI in Smart Homes?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai