Skip to main content
Enterprise AI Analysis: Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task

GPT-POWERED SPREADSHEET MODELING

Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task

This paper investigates the efficacy of GPT-based tools, specifically Excel AI, in generating reusable analytical spreadsheet models from simple problem statements. While showing promise in producing structured models, findings reveal inconsistencies and reproducibility issues, underscoring challenges in 'confidence' and 'workflow' for professional application.

Executive Impact: Key Findings at a Glance

Our experiments reveal critical insights into the current capabilities and limitations of GPTs in generating industrial-quality spreadsheet models.

0 GPT Extensions Evaluated
0 Simple Prompts Tested
0 ERFR Criteria for Reusability
0 Problem Statements for Deep Dive

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The Foundation of Reusable Models

Our research aims to understand how GPTs can aid in creating reusable analytical spreadsheet models. For this, we established five Essential Requirements for Reusability (ERFR) to evaluate GPT outputs. These criteria ensure practical value and ease of use in professional settings.

5 ERFR Essential Requirements for Reusability

1) Each input value resides in a single cell; 2) all calculations are done using cell formulas; 3) all calculations are done without hardwired numbers; 4) input and formula cells have appropriate labels; 5) the calculations are accurate.

Selecting the Best GPT for Spreadsheets

We screened five GPT extensions claiming spreadsheet modeling capabilities, based on high usage, popularity, relevant capabilities, and ability to generate XLSX files. Excel AI by pulsrai.com (EAI) was selected for detailed testing due to its superior performance.

GPT Performance Comparison Highlights (Top 3)

Criteria Excel AI (pulsrai.com) Document Maker (community builder) GPT Excel (NAIF J ALOTAIBI)
User Ratings 4.0/5.0 (50K+ ratings) 3.6/5.0 (100K+ conversations) 3.8/5.0 (1K+ ratings)
Conversation Quality Detailed, step-by-step explanations, logic behind formulas. Detailed, informative, provides calculation logic. Shortest, least descriptive.
File Reliability Generally reliable, consistent formatting, minor errors. Highly reliable, consistent structure, usable files. Reliable in some cases, missing formulas.
Spreadsheet Design Well-organized, inputs/outputs separated & labeled. Moderate structure, decent formatting, lacks model title. Moderate structure, decent formatting, lacks title.

Excel AI consistently produced well-organized spreadsheet models with accurate, editable formulas, demonstrating lower technical errors compared to its counterparts.

Structured Experiments: Simple Prompts

We conducted structured experiments using eight simple problem statements to test EAI's capabilities across three questions: Data vs. Parameters, "30 days" vs. "month", and Known vs. Made-Up Noun. EAI showed good overall performance, although with notable inconsistencies.

Prompt Structure for EAI Experiments

Problem Statement (Pure Logic)
Instructions (Desired Output)
Generated Spreadsheet Model

One significant finding was EAI's struggle with Problem Statement 6 ("Spending, using Parameters; using Month"), where it generated a spreadsheet with numbers instead of formulas for key calculations, violating the ERFR criterion for reusability. This also highlighted reproducibility issues, as repeat runs produced different results.

Case Study: The Failed Spreadsheet Model (Problem Statement 6)

Problem: "A man buys apples every day for a month. Given the price per apple and the number of apples he buys daily, what is total spending?"

EAI Output: While accurate in values, the spreadsheet for Problem Statement 6 contained hardwired numbers instead of cell formulas for the total cost column. This makes the model non-reusable and violates ERFR, despite generating reasonable dummy data for daily prices and quantities.

Implication: An accurate result is not sufficient if the underlying model is not dynamic and reusable. This critical failure mode highlights the need for careful verification of GPT outputs.

The Wall Task: A Larger Problem Statement

The Wall Task, a paragraph-long problem statement involving bid estimation for construction, revealed EAI's varied performance. Initial attempts often produced correct values but lacked cell formulas. Follow-up prompts sometimes yielded "perfect" ERFR-compliant models, while other times resulted in corrupt or unusable files.

Wall Task Performance: Delight & Trouble

When prompted to "Create an Excel model. Use cell formulas. Provide a downloadable Excel file.", EAI initially produced a spreadsheet with correct numbers but no cell formulas.

A subsequent prompt for a "dynamic calculator with input fields" often resulted in a near-perfect ERFR-compliant model with clear input and output modules, which was impressive.

However, other runs with similar requests sometimes produced corrupt Excel files or models with inaccurate/missing formulas, making them unusable. This erratic behavior underscores the challenge of relying on GPTs for complex tasks.

Conclusion: EAI can sometimes deliver exceptional results, but its inconsistency demands rigorous user verification, even for what seems like a simple follow-up.

Challenges and Future Research

The study identifies two central challenges: the "problem of confidence" (verifying accuracy and ERFR compliance) and the "problem of workflow" (integrating imperfect GPT drafts into a useful process). Reproducibility also remains a significant issue.

Workflow for Imperfect GPT Spreadsheet Output

GPT Generates "Rough Draft"
User Evaluation & Correction
Refactor to ERFR Standard
Final Reusable Model
Unreliable Current State for Professional Use

While GPTs show promise for generating draft models, their current inconsistency and lack of reproducibility make them unreliable for professional use without significant human oversight and expertise.

Future research should focus on prompt engineering for larger models, improving consistency and reproducibility, and developing methods to reduce the human evaluation and intervention required to achieve ERFR compliance.

Advanced ROI Calculator for AI Adoption

Estimate the potential annual savings and reclaimed hours by integrating AI solutions into your enterprise operations, based on industry averages and your specific parameters.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your Enterprise AI Implementation Roadmap

A phased approach to integrate AI for maximum impact and minimal disruption, ensuring a smooth transition to enhanced efficiency.

Phase 01: Strategic Assessment & Pilot

Comprehensive analysis of current processes, identification of high-impact AI opportunities, and execution of a focused pilot project to demonstrate ROI and refine approach.

Phase 02: Scaled Integration & Development

Developing and integrating AI solutions across identified business units, focusing on robust architecture, data governance, and seamless workflow integration.

Phase 03: Continuous Optimization & Expansion

Monitoring AI system performance, iterating based on feedback, and exploring new opportunities for AI expansion to unlock further efficiency and innovation.

Ready to Transform Your Enterprise with AI?

Book a complimentary consultation with our AI specialists to explore tailored strategies and see how our solutions can drive your business forward.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking