Research Paper Analysis

Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

By Gaole He, Gianluca Demartini, Ujwal Gadiraju • Published: April 26-May 1, 2025

Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword - (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents.

Schedule Your Strategy Session

Executive Impact at a Glance

Key performance indicators from the study highlight the nuanced effectiveness of LLM agents in human-AI collaboration.

0 Calibrated Trust (Planning)

0 Calibrated Trust (Execution)

0 High-Quality Plan Execution Accuracy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM Agent Performance

Plan-Then-Execute Workflow

User Involvement Impact

Real-World Scenario

Average Execution Accuracy with High-Quality Plans

When LLM agents operate with high-quality, pre-defined plans, their execution accuracy significantly improves, highlighting the importance of robust initial planning for successful task completion.

66.7% Average Execution Accuracy with High-Quality Plans

Enterprise Process Flow: Plan-Then-Execute LLM Agents

The study adopted a "plan-then-execute" workflow, wherein LLM agents first generate a plan, which users can then review and edit. Following this, the agents execute the plan step-by-step, with opportunities for user approval or modification at each action prediction.

Core LLM Agent Workflow

LLM Planning (Initial Plan Generation)

→

User-Involved Planning (Plan Edit)

→

Planning Outcome (Reviewed Plan)

→

Action Prediction (LLM)

→

User-Involved Execution (Approve/Modify)

→

Execution Outcome (Result)

Human Oversight: Benefits & Challenges

While human involvement offers critical error correction, it also introduces complexities that require careful management to avoid unintended trade-offs in performance and user experience.

Feature	Benefits of User Involvement	Challenges of User Involvement
Planning Stage	Fixes imperfect plans (grammar errors, missing details). Improves plan quality in contexts with initial errors.	Does not consistently calibrate user trust. Can harm plan quality for initially good plans. Increases cognitive load and reduces user confidence.
Execution Stage	Corrects LLM agent prediction errors (invalid actions, wrong parameters). Prevents risky actions and improves task outcome accuracy.	Does not consistently calibrate user trust. Increases cognitive load and reduces user confidence.

Case Study: Calibrating LLM Itinerary Selection (Task-6)

This task involved planning a multi-day trip to Japan with specific requirements (destinations, dates, budget). Initially, the LLM agent generated a correct plan, but in the execution phase, made a critical error by selecting an incorrect itinerary that did not match the task description. Users who did not intervene in the execution stage failed to achieve correct task outcomes due to this error. However, participants who carefully checked and corrected the LLM agent's itinerary selection significantly improved the task outcome accuracy, demonstrating the value of human oversight in critical execution steps, even when the initial plan was sound.

Key Takeaway from Task-6

Even with a well-structured plan, LLM agents can introduce execution errors. Human intervention is crucial for validating critical steps and ensuring alignment with task objectives, especially in scenarios where LLM choices might seem plausible but are factually incorrect.

Impact: User involvement significantly improved task outcome accuracy in this scenario, highlighting the need for dynamic human oversight in LLM-assisted workflows.

Estimate Your Enterprise AI ROI

See how LLM agents can transform your operations. Adjust the parameters to calculate potential annual savings and reclaimed hours.

Industry

Number of Employees Supported by AI

Average Hours Saved per Employee per Week (by AI)

Average Hourly Cost per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

Based on these insights, here’s a potential roadmap for integrating LLM agents into your enterprise, maximizing efficiency and minimizing risks.

Phase 1: Enhanced Trust Calibration Methods

Develop interventions to help users better calibrate their trust in LLM agents, especially when faced with convincingly wrong but plausible LLM outputs. This includes implementing robust performance feedback mechanisms and agreement-in-confidence heuristics to foster appropriate reliance.

Phase 2: Optimized User Involvement Workflows

Design flexible human-AI collaborative workflows that allow dynamic adjustment of user involvement based on task complexity and risk. The goal is to minimize cognitive load while maximizing effective human oversight, ensuring human intervention is provided precisely when and where it offers the most value.

Phase 3: Real-Time Plan & Execution Refinement

Implement systems where users can simultaneously fix planning and execution issues, allowing for interactive revision of plans and real-time adjustment of predicted actions. This provides immediate feedback on potential impacts, creating a more agile and responsive human-AI collaboration environment.

Discuss Your Implementation Strategy

Ready to Transform Your Enterprise with AI?

Leverage the power of LLM agents effectively. Our experts are ready to help you navigate the complexities of AI integration, ensuring optimal performance and calibrated trust.

Book a Consultation Now

Research Paper Analysis

Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

Executive Impact at a Glance

Deep Analysis & Enterprise Applications

Average Execution Accuracy with High-Quality Plans

Enterprise Process Flow: Plan-Then-Execute LLM Agents

Core LLM Agent Workflow

Human Oversight: Benefits & Challenges

Case Study: Calibrating LLM Itinerary Selection (Task-6)

Key Takeaway from Task-6

Estimate Your Enterprise AI ROI

Your AI Implementation Roadmap

Phase 1: Enhanced Trust Calibration Methods

Phase 2: Optimized User Involvement Workflows

Phase 3: Real-Time Plan & Execution Refinement

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai