Enterprise AI Analysis: Deconstructing LLM Reliability in Complex Problem Formulation

An in-depth analysis from OwnYourAI.com on the research paper "An Empirical Exploration of ChatGPT's Ability to Support Problem Formulation Tasks for Mission Engineering" by Max Ofsa and Taylan G. Topcu. We translate these critical academic findings into actionable strategies for enterprise AI adoption.

Executive Summary: The Double-Edged Sword of Generative AI

The study by Ofsa and Topcu provides a rigorous, empirical look at a question every enterprise leader is asking: can we trust Large Language Models (LLMs) like ChatGPT for complex, strategic tasks? The research focuses on "problem formulation"the critical first step of any major project, akin to an enterprise defining the scope of a new product launch or digital transformation initiative.

The findings paint a nuanced picture. While LLMs show promise as powerful assistants, their performance is fraught with inconsistency and hidden biases. Relying on them without a structured, expert-led framework is not just risky; it can lead entire projects astray from the very beginning.

The Performance Paradox

On average, ChatGPT correctly identified only ~56% of crucial project stakeholders. This "half-right" performance means a significant portion of the strategic landscape is consistently missed.

Cognitive Blind Spots

The LLM showed a strong bias towards human-centric factors (teams, agencies) while consistently ignoring critical non-human elements like environmental conditions, regulations, and enabling systemsa fatal flaw for any real-world enterprise project.

The Unreliability Factor

The most critical finding: immense variability between identical queries. Asking the same question twice often yields dramatically different results, making single-shot prompting a high-stakes gamble for mission-critical tasks.

Key Findings Deconstructed for Enterprise Strategy

To understand the implications for your business, we need to go beyond the summary and analyze the data. The researchers tested ChatGPT-3.5 across 25 independent trials on the same complex problem: identifying stakeholders for a manned mission to Ceres. This is analogous to an enterprise planning a market entry into a new continent. Heres what they found, and what it means for you.

Finding 1: The High-Stakes "Roll of the Dice" - Performance Variability Visualized

The paper's most powerful illustration is the stark difference in outputs across multiple attempts. We have recreated their findings below. Each bar represents a single, independent attempt to solve the same problem. The colors show the quality of the AI's output. Notice how no two bars are the same. This visualizes the core challenge: relying on a single AI response is unreliable by design.

Correct

Over-Specified

Wrong

Missed

*Recreation of findings from Ofsa and Topcu (2025). Total ground-truth stakeholders = 12.

Enterprise Takeaway: A "one-and-done" approach to using generative AI for strategic tasks is a recipe for failure. Your enterprise needs a system that embraces this randomness. A custom solution from OwnYourAI.com can implement an "Ensemble AI" methodology, running multiple queries in parallel, cross-referencing the outputs, and flagging inconsistencies for expert human review. This turns variability from a liability into a strength by exploring a wider range of possibilities.

Finding 2: Decomposing the AI's Performance

The researchers quantified the performance across all 25 trials. The results reveal a consistent pattern of mediocrity and a tendency to generate noise alongside signal. A custom AI solution must be designed to filter this noise and amplify the signal.

Enterprise Takeaway: An off-the-shelf LLM gives you a raw, unfiltered stream of information that is, on average, nearly 50% incomplete or irrelevant. A bespoke solution integrates domain-specific knowledge and validation layers. It can be trained to recognize what constitutes an "over-specified" response in your industry context and automatically categorize outputs, saving your experts from wading through pages of low-value suggestions.

Finding 3: The Danger of Premature Solutions and Hidden Biases

The study found that LLMs have a tendency to be "solution-seeking." Instead of sticking to high-level problem formulation, they often suggest specific components, teams, or technologies ("over-specified" results). This prematurely narrows the solution space and stifles innovation. Furthermore, the complete omission of non-human stakeholders like "Operational Environment" reveals a deep-seated bias. An AI that can't account for market conditions or regulatory hurdles is unfit for enterprise strategy.

Enterprise Takeaway: Your AI tools must be constrained to the appropriate level of abstraction for each project phase. OwnYourAI.com builds custom solutions with "guardrails" that keep the AI focused. By using techniques like prompt engineering templates and structured output formats (e.g., forcing JSON outputs that separate stakeholder categories), we ensure the AI serves your strategic process, rather than derailing it.

From Research to Reality: A Strategic Roadmap for AI Integration

The insights from this paper are not a warning to abandon AI, but a call for a more mature, strategic approach to its implementation. Off-the-shelf tools provide a glimpse of the possible; custom solutions deliver reliable, measurable value. Here is how OwnYourAI.com helps enterprises bridge that gap.

Interactive ROI Calculator: Quantify the "Expert Augmentation" Value

The paper suggests LLMs can reduce expert workload. But by how much? Use our calculator to estimate the potential time and cost savings by implementing a custom "Ensemble AI" assistant for your project formulation tasks.

The OwnYourAI.com Implementation Roadmap

Deploying AI for complex tasks requires a phased approach. Our methodology ensures that AI is integrated seamlessly into your existing workflows, augmenting your experts instead of replacing them, and delivering a clear return on investment at each stage.

Is Your Enterprise Ready for Strategic AI? A Quick Assessment

This research highlights that successful AI adoption depends on more than just technology. It requires the right processes and mindset. Take our quick quiz to see how your organization stacks up.

Turn AI's Variability into Your Strategic Advantage

The research by Ofsa and Topcu is a landmark paper for any enterprise serious about leveraging AI. It proves that while the potential is enormous, the path to reliable, strategic AI is not through generic, off-the-shelf tools. It's through custom-built, expert-guided systems that are designed to handle ambiguity, mitigate bias, and transform variability into a comprehensive exploration of possibilities.

Don't gamble with your next major initiative. Let's build an AI solution that provides the consistency, reliability, and depth your enterprise deserves.

Enterprise AI Analysis: Deconstructing LLM Reliability in Complex Problem Formulation

Executive Summary: The Double-Edged Sword of Generative AI

The Performance Paradox

Cognitive Blind Spots

The Unreliability Factor

Key Findings Deconstructed for Enterprise Strategy

Finding 1: The High-Stakes "Roll of the Dice" - Performance Variability Visualized

Finding 2: Decomposing the AI's Performance

Finding 3: The Danger of Premature Solutions and Hidden Biases

From Research to Reality: A Strategic Roadmap for AI Integration

Interactive ROI Calculator: Quantify the "Expert Augmentation" Value

The OwnYourAI.com Implementation Roadmap

Is Your Enterprise Ready for Strategic AI? A Quick Assessment

Turn AI's Variability into Your Strategic Advantage

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai