Skip to main content
Enterprise AI Analysis: Forma mentis networks predict creativity ratings of short texts via interpretable artificial intelligence in human and Al-simulated raters

Journal of Computational Social Science

Forma mentis networks predict creativity ratings of short texts via interpretable artificial intelligence in human and Al-simulated raters

Creativity is a fundamental skill of human cognition. This study introduces textual forma mentis networks (TFMN) to extract network and emotional features from stories generated by humans, GPT-3.5, and Sonnet 3.7. Utilizing Explainable Artificial Intelligence (XAI) and XGBoost, we investigate whether Mednick's associative theory features explain creativity ratings by humans or AI. Our findings indicate that GPT-3.5 and Sonnet 3.7 ratings significantly differ from human assessments, with AI models exhibiting a bias towards their own generated content. Network features are more predictive for human creativity, while emotional features are more critical for AI's self-ratings. This highlights critical limitations in current AI models' ability to align with human perceptions of creativity, underscoring the need for caution in deploying AI for creative content evaluation and generation.

Executive Impact: Quantifying AI's Role in Creative Assessment

Our research uncovers critical performance metrics and behavioral patterns of AI models in assessing creativity, providing a quantitative basis for strategic AI integration.

0.617 Human-Human Prediction Accuracy
0.996 Sonnet 3.7 Self-Rating Accuracy
0.716 GPT-3.5 Rating Human Stories Accuracy
0.915 Sonnet 3.7 Rating Human Stories Accuracy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The introduction sets the stage for understanding creativity as a fundamental human skill and how it's measured in textual data. It highlights the challenges of evaluating text creativity due to subjectivity and introduces the potential of network science and machine learning approaches, along with the gap in research concerning LLM creativity assessment.

This section details the methods used, including the collection of human-, GPT-3.5-, and Sonnet 3.7-generated stories, the process of assigning creativity ratings by human and AI raters, and the computation of Textual Forma Mentis Networks (TFMNs) to extract network and emotional features. Explainable AI (XAI) with SHAP values is employed to interpret model predictions.

Textual Forma Mentis Network (TFMN) Construction

TFMNs provide a powerful tool for analyzing the structural and conceptual construction of short stories. They capture both syntactic and emotional links, breaking narratives into associations between words.

Tokenisation
Syntactic parsing
Connecting words
Network construction
Normalisation + Enrichment
Assigning emotional valence
Quantifying emotional features

The results present the findings from our study, including the comparison of story lengths, statistical differences in network and emotion features between human and AI-generated stories, and model performance evaluations. It highlights the varying feature importance patterns identified by SHAP scores across different rating scenarios.

AI Rating Divergence: A Core Challenge

p < 0.001 Significant Difference (Human vs. AI Ratings)

Our analysis revealed statistically significant differences between human and both GPT-3.5 and Sonnet 3.7 ratings of human-authored stories, highlighting a fundamental divergence in assessment criteria.

This discussion interprets the findings, focusing on the structural and emotional differences between human- and AI-generated stories. It explores the implications of AI's distinct evaluative frameworks for creativity and emphasizes the need for caution when using AI for creative content generation and assessment.

Human vs. AI: Creativity Assessment Criteria

Human and AI raters employ distinct frameworks for evaluating creativity. Human raters prioritize structural complexity, while AI (especially GPT-3.5 self-ratings) emphasizes emotional cues.
Assessment Focus Human Raters GPT-3.5 (Self-Rating) Sonnet 3.7 (Mixed)
Key Features Prioritized Network features (ASPL, LCC, Degree Centrality) Emotional features (Joy, Anger, Anticipation) Mixed (Joy, Clustering Coeff, Degree Centrality)
Impact on Rating Higher scores for balanced cohesion & flexibility Higher scores for vivid, positive emotional content Higher scores for blend of structure & emotion
Underlying Framework Associative Theory of Creativity (Mednick) Emotional elicitation, less structural novelty Nuanced consideration of both

The Homogenisation Effect in AI Narratives

"In a [ADJECTIVE] town/village/empire..." or "The grand cathedral stood...", and even character names repeated across unrelated stories.

Source: Section 6.4

Both GPT-3.5 and Sonnet 3.7-generated stories exhibited strong similarities in narrative and sentence structure, frequently starting with common phrases and repeating character names. This 'homogenisation effect' indicates a lack of semantic diversity, underscoring AI's limitations in generating truly varied creative outputs.

The study concludes by summarizing the key findings: significant differences in creativity assessment between human and AI raters, AI models' bias towards their own content, and divergent feature importance (structural vs. emotional) for human and AI evaluations. It highlights limitations in current LLMs' interpretive frameworks for creativity and stresses the need for caution and further alignment with human evaluative criteria.

Calculate Your Potential ROI with AI-Powered Creativity Analysis

Estimate the efficiency gains and cost savings your enterprise could achieve by integrating our interpretable AI creativity assessment solutions.

$0 Estimated Annual Savings
0 Annual Hours Reclaimed

Your AI Creativity Assessment Journey

A structured roadmap for integrating interpretable AI into your creative content evaluation workflows, ensuring a smooth transition and maximum impact.

Phase 1: Discovery & Strategy

Initial consultation to understand your current creative assessment processes, identify key challenges, and define clear objectives for AI integration. We'll develop a tailored strategy aligned with your enterprise goals.

Phase 2: Data Preparation & Model Training

Work with your team to prepare and annotate creative data. Our experts will then train custom TFMN and XAI models, ensuring they accurately reflect your specific creativity criteria and content types.

Phase 3: Integration & Pilot Deployment

Seamlessly integrate the AI assessment tools into your existing platforms. Conduct pilot programs with a subset of your team to gather feedback and refine the system for optimal performance and user experience.

Phase 4: Full-Scale Rollout & Continuous Improvement

Deploy the AI creativity assessment solution across your enterprise. Establish ongoing monitoring, provide comprehensive training, and implement continuous model improvement based on performance data and evolving needs.

Ready to Unlock Deeper Insights into Creative Content?

Don't let subjective evaluations limit your creative potential. Partner with us to implement AI-powered, interpretable creativity assessment tailored to your enterprise needs.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking