Skip to main content
Enterprise AI Analysis: Telling Speculative Stories to Help Humans Imagine the Harms of Healthcare AI

Future-Proofing Healthcare AI

Leveraging Speculative Storytelling for Proactive Ethical Design

Artificial intelligence is rapidly transforming healthcare, but rapid development brings risks of bias, privacy, and unequal access. This research introduces a human-centered framework that uses speculative storytelling to help humans imagine potential benefits and harms of healthcare AI before deployment. Our findings show that this approach significantly enhances ethical foresight and fosters more creative thinking about AI's impact on users, moving safety evaluation from reactive to proactive.

Quantifiable Impact: Enhancing Ethical AI Development

Our innovative storytelling framework demonstrates tangible improvements in identifying and understanding AI risks and benefits.

0 Avg. LLM Judge Preference for Storytelling
0 Human Preference for Storytelling (Llama3)
0 Increase in Harm Diversity Identified

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Framework & Process
Performance & Impact
Qualitative Insights
Future Directions

Our Human-Centered Storytelling Framework

Our methodology involves a three-step process to generate context-sensitive user stories and support multi-agent discussions for ethical foresight.

Mapping AI Concepts to Use-Case Scenarios
Simulating Role-Playing & Environment Trajectories
Rephrasing Simulation Logs into Stories
Story Type Creativity Coherence Engagement Relevance Likelihood Overall (Avg)
Baseline (Gemma) 65.25% 68.30% 80.15% 71.20% 78.90% 72.76%
Storytelling (ours) Gemma 89.45% 92.15% 92.75% 85.65% 96.05% 91.21%
Baseline (Llama3) 59.25% 71.55% 76.15% 71.60% 70.00% 69.71%
Storytelling (ours) Llama3 79.50% 94.75% 89.45% 85.65% 96.85% 89.24%
w/o Env. Trajectories (Gemma) 55.30% 74.35% 78.80% 73.45% 85.50% 73.48%
w/o Role-Playing (Gemma) 79.45% 86.80% 83.95% 83.15% 91.05% 84.88%

Table: Overall results of different models and methods. Storytelling (ours) achieves the best performance across all metrics. Values denote win rates (%). (Adapted from Table 1)

88% Human Preference for Storytelling (Llama3)

Human evaluators showed a strong preference for our Storytelling method (88% for Llama3) over baselines, aligning with LLM-as-a-judge results and highlighting ease of follow and engagement.

59% Increase in Harm Diversity Identified

Storytelling significantly broadened participants' recognition of potential harms, with a 59% increase in Shannon entropy (from 2.329 to 3.701) compared to the control group, covering a wider range of 17 harm types. Less obvious and context-dependent harms appeared only in the STORY condition.

60.7% Increase in Benefit Diversity Identified

Our method also fostered a 60.7% increase in the diversity of recognized benefits (Shannon entropy from 2.407 to 3.868), surfacing less salient benefits like accessibility support, clinician workload relief, and transparency.

Qualitative Insights: Deeper Ethical Reflection

"The story provides a concrete example of how AI can be harmful."

— P7, User Study Participant

Participants consistently reported that narrative scenarios fostered deeper ethical and contextual reflection. Storytelling helped them articulate risks that were otherwise difficult to express, surfacing overlooked issues such as 'the lack of cultural context' (P6) or emotional harms like 'masking of feelings' (P3). The approach was found engaging and accessible, allowing participants to focus on ethical reflection rather than technical complexity, making it easier to engage meaningfully with ethical scenarios.

Control group participants often produced abstract harms (e.g., 'using facial expression to determine who will not default the agreement'), while storytelling participants anchored harms in individual contexts (e.g., 'diagnosis should be different for different peoples' as they 'might be having some allergy that could later be severe for their health').

Limitations and Future Directions

This study focused on consumer health and did not include regulated domains. Scenarios were synthetic, enabling early ethical exploration but not substituting for deployed system analysis. The user study was small, with mostly technically-inclined participants, and measured short-term reflection, not long-term impact.

Future work will evaluate the framework across diverse domains, with larger and more diverse human studies (including clinicians and patients), and using multiple evaluation models beyond LLM-as-a-judge. Simulating expert discussions used predefined personas, which enables rapid iteration but may not capture the full range of real stakeholder perspectives.

Calculate Your Potential AI ROI

Estimate the time and cost savings AI can bring to your enterprise operations.

Estimated Annual Savings $0
Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A typical timeline for integrating advanced AI solutions into your enterprise.

Phase 1: Discovery & Strategy

Comprehensive analysis of your current workflows, identifying key AI opportunities and defining success metrics.

Phase 2: Pilot & Proof of Concept

Developing and deploying a targeted AI pilot, demonstrating tangible value and refining the solution based on initial results.

Phase 3: Full-Scale Integration

Seamlessly integrating AI across relevant departments, ensuring scalability, security, and user adoption.

Phase 4: Optimization & Future-Proofing

Continuous monitoring, performance tuning, and exploring new AI advancements to maintain a competitive edge.

Ready to Transform Your Enterprise with Ethical AI?

Book a personalized strategy session with our AI experts to explore tailored solutions for your business.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking