Enterprise AI Analysis

High-Risk Memories? Comparative audit of the representation of Second World War atrocities in Ukraine by generative AI applications

This paper investigates how generative AI (genAI) applications represent and potentially misrepresent high-risk memories, specifically Second World War atrocities in Ukraine. It audits three common genAI applications for historical misrepresentation, including hallucinations and inconsistent moralization, across different languages and atrocity types. The findings highlight significant inaccuracies and ethical concerns, especially for less-known memories and lower-resource languages.

Schedule Your Strategy Session

Executive Impact: Key Findings for Enterprise Leaders

Generative AI models demonstrate limited and inconsistent accuracy in representing high-risk historical memories, particularly WWII atrocities in Ukraine. A significant portion of responses contains factual inaccuracies and hallucinations, especially in lower-resource languages. While moralizing statements are present, their inclusion is inconsistent across applications, languages, and specific historical episodes, undermining genAI's perceived authority and creating a skewed moral hierarchy. This poses substantial risks for historical misrepresentation and instrumentalization.

0% Approximate Accuracy Rate for GenAI on Historical Facts

0% Accuracy Rate for Lower-Resource Languages (Bard Russian, Bing Chat Ukrainian)

0% Bard Outputs Containing Hallucinations (Overall)

0% ChatGPT Responses with Moralizing Statements (English/Russian)

Discuss Your Implementation Strategy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Generative AI and the Distortion of High-Risk Memories

GenAI accelerates content production but risks historical misrepresentation, from distorting facts and depicting groups inaccurately to subtle selective moralization. This is critical for high-risk memories, like WWII atrocities, which carry strong emotional loads and are often instrumentalized politically. Misrepresentation is heightened by genAI's probabilistic 'memory,' which reiterates contradictory interpretations from its training data, especially for contested high-risk memories without external safeguards or ethical guidelines. Hallucinations and moralizing statements, though sometimes intended for safety, can misleadingly attribute moral authority to AI and create skewed historical narratives.

50% Approximate accuracy rate for genAI responses on historical facts compared to human baseline.

Enterprise Process Flow

GenAI Content Production

→

Information Discovery Alteration

→

Historical Misrepresentation Risk

→

Distortion/Denial Amplification

→

Ethical Obligations Challenged

Aspect	Human Memory (Contrast)	GenAI (Risk)
Nature of Memory	Cognitive (encoding, storage, retrieval)	Probabilistic (next token prediction)
Understanding Context	Entangled with social practices, negotiated truth	Limited understanding of historical accuracy/ethics
Misrepresentation Source	Intentional manipulation, selective recall	Training data deficiencies, unintentional reiteration of conflicts, hallucinations
Ethical Framework	Societally negotiated ethical obligations	Relies on explicit developer specification; default is probabilistic output

Auditing GenAI: Accuracy, Hallucinations, and Language Variation

Empirical audits reveal that genAI applications struggle with high-risk memories. Only about 50% of responses align with human baselines on specific historical facts, a rate that significantly decreases for lower-resource languages like Ukrainian and Russian (often 30% or less). Hallucinations are common, with Bard producing them in over half its outputs, particularly for Ukrainian prompts. This highlights how inadequate knowledge bases in certain languages exacerbate misleading claims, even without adversarial intent. Such variations mean users in specific linguistic contexts are far more likely to receive inaccurate information.

30% Accuracy rate for Bard in Russian and Bing Chat in Ukrainian on historical facts, indicating significant language dependency.

Application	English Accuracy (Approx.)	Ukrainian/Russian Accuracy (Approx.)	Hallucination Tendency
Bard	70% (Holocaust general)	30% (Russian), 50% (Ukrainian)	High (50%+ overall)
ChatGPT	50-70% (stable)	40-50% (less dramatic drop)	Moderate (partially correct responses common)
Bing Chat	50-60% (Holocaust general)	30% (Ukrainian), 50% (Russian)	Moderate (irrelevant responses, less hallucinations)

Case Study: Stepan Bandera and Lviv Pogrom

Bard, when prompted in Ukrainian, produced multiple misleading claims about Stepan Bandera, an anti-Soviet resistance leader. It incorrectly stated that he rejected Nazi ideas and was arrested in July 1941 for refusing to fight the Soviet Union. Similarly, it claimed the Lviv pogrom in 1941 was the 'only time' Ukrainians participated in killing Jews. These examples illustrate how limited knowledge bases in lower-resource languages lead to significant historical distortions and invented narratives, including non-existent testimonies.

The Selective Moral Authority of AI in Historical Narratives

GenAI applications, particularly ChatGPT, frequently include moralizing statements in responses about mass atrocities. While sometimes reinforcing ethical lessons, this moralization is often inconsistent across different applications, languages, and specific atrocity instances. For example, some atrocities are labeled 'horrific' with explicit calls to remember, while others (or the same event in a different language) receive no such moral framing. This inconsistency can lead to a skewed moral hierarchy of memories and mislead users into perceiving AI as a moral authority it does not possess, enabling the selective enforcement of standardized representation patterns often associated with the Global North.

80% Highest frequency of moralizing statements in Bard (English prompts, Polish atrocities).

Application	Moralization Frequency	Consistency Across Languages/Topics	Example Framing
ChatGPT	High (50%+ for English/Russian)	More stable (40-50%) but internal variation	Emphasizes historical records, avoids blame for entire groups
Bard	Moderate (esp. Ukrainian prompts)	Highly inconsistent (e.g., 80% English Polish atrocities vs. 20% Russian)	Uses 'dark chapters,' 'horrible tragedies,' emphasizes 'tolerance and understanding'
Bing Chat	Low (20-25%)	Inconsistent (e.g., 60%+ Ukrainian anti-Ukrainian atrocities vs. 0% English)	Uses 'tragic,' 'brutal,' occasionally 'most horrible crimes'

Enterprise Process Flow

GenAI Produces Moralizing Statements

→

Statements Reinforce Normative Interpretations

→

Inconsistent Moralization Across Contexts

→

Skewed Moral Hierarchy Created

→

User Perception of AI Moral Authority Distorted

Calculate Your Potential AI Impact

Estimate the efficiency gains and cost savings for your enterprise by integrating responsible AI solutions, considering the nuanced ethical implications highlighted in this research.

Industry Sector

Number of Employees Affected

Hours per Week on Manual Tasks (per employee)

Average Hourly Cost (incl. overhead)

Annual Savings

Hours Reclaimed Annually

Your AI Implementation Roadmap

Navigate the complexities of AI integration with a clear, phase-by-phase approach, focusing on ethical considerations and robust performance in sensitive domains.

Phase 1: Ethical Assessment & Data Audit

Conduct a comprehensive audit of existing data and systems for potential biases and misrepresentation risks, especially for sensitive historical or social data. Establish a clear 'North Star' for AI ethical behavior and memory representation.

Phase 2: Custom Model Development & Refusal Mechanisms

Develop or fine-tune AI models with specialized training data and implement refusal mechanisms for queries lacking sufficient information or posing high misrepresentation risks, aligning with the "North Star" vision.

Phase 3: Consistency & Moralization Standardisation

Implement a framework to ensure consistent moralization and normative framing across different languages and contexts, preventing skewed moral hierarchies and selective enforcement of historical narratives.

Phase 4: Continuous Monitoring & Expert Oversight

Establish ongoing monitoring processes and integrate human expert oversight to detect and correct emerging misrepresentations, hallucinations, or inconsistencies in AI outputs, particularly in high-risk memory domains.

Ready to Build Responsible AI for Your Enterprise?

Leverage our expertise to develop AI solutions that are accurate, ethical, and aligned with your organizational values, mitigating the risks of misrepresentation and fostering trust.

Book Your AI Consultation

Enterprise AI Analysis

High-Risk Memories? Comparative audit of the representation of Second World War atrocities in Ukraine by generative AI applications

Executive Impact: Key Findings for Enterprise Leaders

Deep Analysis & Enterprise Applications

Generative AI and the Distortion of High-Risk Memories

Enterprise Process Flow

Auditing GenAI: Accuracy, Hallucinations, and Language Variation

Case Study: Stepan Bandera and Lviv Pogrom

The Selective Moral Authority of AI in Historical Narratives

Enterprise Process Flow

Calculate Your Potential AI Impact

Your AI Implementation Roadmap

Phase 1: Ethical Assessment & Data Audit

Phase 2: Custom Model Development & Refusal Mechanisms

Phase 3: Consistency & Moralization Standardisation

Phase 4: Continuous Monitoring & Expert Oversight

Ready to Build Responsible AI for Your Enterprise?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai