Enterprise AI Analysis
High-Risk Memories? Comparative audit of the representation of Second World War atrocities in Ukraine by generative AI applications
This paper investigates how generative AI (genAI) applications represent and potentially misrepresent high-risk memories, specifically Second World War atrocities in Ukraine. It audits three common genAI applications for historical misrepresentation, including hallucinations and inconsistent moralization, across different languages and atrocity types. The findings highlight significant inaccuracies and ethical concerns, especially for less-known memories and lower-resource languages.
Executive Impact: Key Findings for Enterprise Leaders
Generative AI models demonstrate limited and inconsistent accuracy in representing high-risk historical memories, particularly WWII atrocities in Ukraine. A significant portion of responses contains factual inaccuracies and hallucinations, especially in lower-resource languages. While moralizing statements are present, their inclusion is inconsistent across applications, languages, and specific historical episodes, undermining genAI's perceived authority and creating a skewed moral hierarchy. This poses substantial risks for historical misrepresentation and instrumentalization.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Generative AI and the Distortion of High-Risk Memories
GenAI accelerates content production but risks historical misrepresentation, from distorting facts and depicting groups inaccurately to subtle selective moralization. This is critical for high-risk memories, like WWII atrocities, which carry strong emotional loads and are often instrumentalized politically. Misrepresentation is heightened by genAI's probabilistic 'memory,' which reiterates contradictory interpretations from its training data, especially for contested high-risk memories without external safeguards or ethical guidelines. Hallucinations and moralizing statements, though sometimes intended for safety, can misleadingly attribute moral authority to AI and create skewed historical narratives.
Enterprise Process Flow
| Aspect | Human Memory (Contrast) | GenAI (Risk) |
|---|---|---|
| Nature of Memory |
|
|
| Understanding Context |
|
|
| Misrepresentation Source |
|
|
| Ethical Framework |
|
|
Auditing GenAI: Accuracy, Hallucinations, and Language Variation
Empirical audits reveal that genAI applications struggle with high-risk memories. Only about 50% of responses align with human baselines on specific historical facts, a rate that significantly decreases for lower-resource languages like Ukrainian and Russian (often 30% or less). Hallucinations are common, with Bard producing them in over half its outputs, particularly for Ukrainian prompts. This highlights how inadequate knowledge bases in certain languages exacerbate misleading claims, even without adversarial intent. Such variations mean users in specific linguistic contexts are far more likely to receive inaccurate information.
| Application | English Accuracy (Approx.) | Ukrainian/Russian Accuracy (Approx.) | Hallucination Tendency |
|---|---|---|---|
| Bard |
|
|
|
| ChatGPT |
|
|
|
| Bing Chat |
|
|
|
Case Study: Stepan Bandera and Lviv Pogrom
Bard, when prompted in Ukrainian, produced multiple misleading claims about Stepan Bandera, an anti-Soviet resistance leader. It incorrectly stated that he rejected Nazi ideas and was arrested in July 1941 for refusing to fight the Soviet Union. Similarly, it claimed the Lviv pogrom in 1941 was the 'only time' Ukrainians participated in killing Jews. These examples illustrate how limited knowledge bases in lower-resource languages lead to significant historical distortions and invented narratives, including non-existent testimonies.
The Selective Moral Authority of AI in Historical Narratives
GenAI applications, particularly ChatGPT, frequently include moralizing statements in responses about mass atrocities. While sometimes reinforcing ethical lessons, this moralization is often inconsistent across different applications, languages, and specific atrocity instances. For example, some atrocities are labeled 'horrific' with explicit calls to remember, while others (or the same event in a different language) receive no such moral framing. This inconsistency can lead to a skewed moral hierarchy of memories and mislead users into perceiving AI as a moral authority it does not possess, enabling the selective enforcement of standardized representation patterns often associated with the Global North.
| Application | Moralization Frequency | Consistency Across Languages/Topics | Example Framing |
|---|---|---|---|
| ChatGPT |
|
|
|
| Bard |
|
|
|
| Bing Chat |
|
|
|
Enterprise Process Flow
Calculate Your Potential AI Impact
Estimate the efficiency gains and cost savings for your enterprise by integrating responsible AI solutions, considering the nuanced ethical implications highlighted in this research.
Your AI Implementation Roadmap
Navigate the complexities of AI integration with a clear, phase-by-phase approach, focusing on ethical considerations and robust performance in sensitive domains.
Phase 1: Ethical Assessment & Data Audit
Conduct a comprehensive audit of existing data and systems for potential biases and misrepresentation risks, especially for sensitive historical or social data. Establish a clear 'North Star' for AI ethical behavior and memory representation.
Phase 2: Custom Model Development & Refusal Mechanisms
Develop or fine-tune AI models with specialized training data and implement refusal mechanisms for queries lacking sufficient information or posing high misrepresentation risks, aligning with the "North Star" vision.
Phase 3: Consistency & Moralization Standardisation
Implement a framework to ensure consistent moralization and normative framing across different languages and contexts, preventing skewed moral hierarchies and selective enforcement of historical narratives.
Phase 4: Continuous Monitoring & Expert Oversight
Establish ongoing monitoring processes and integrate human expert oversight to detect and correct emerging misrepresentations, hallucinations, or inconsistencies in AI outputs, particularly in high-risk memory domains.
Ready to Build Responsible AI for Your Enterprise?
Leverage our expertise to develop AI solutions that are accurate, ethical, and aligned with your organizational values, mitigating the risks of misrepresentation and fostering trust.