Research Paper Analysis
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant
By Gaole He, Gianluca Demartini, Ujwal Gadiraju • Published: April 26-May 1, 2025
Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword - (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents.
Executive Impact at a Glance
Key performance indicators from the study highlight the nuanced effectiveness of LLM agents in human-AI collaboration.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Average Execution Accuracy with High-Quality Plans
When LLM agents operate with high-quality, pre-defined plans, their execution accuracy significantly improves, highlighting the importance of robust initial planning for successful task completion.
Enterprise Process Flow: Plan-Then-Execute LLM Agents
The study adopted a "plan-then-execute" workflow, wherein LLM agents first generate a plan, which users can then review and edit. Following this, the agents execute the plan step-by-step, with opportunities for user approval or modification at each action prediction.
Core LLM Agent Workflow
Human Oversight: Benefits & Challenges
While human involvement offers critical error correction, it also introduces complexities that require careful management to avoid unintended trade-offs in performance and user experience.
| Feature | Benefits of User Involvement | Challenges of User Involvement |
|---|---|---|
| Planning Stage |
|
|
| Execution Stage |
|
|
Case Study: Calibrating LLM Itinerary Selection (Task-6)
This task involved planning a multi-day trip to Japan with specific requirements (destinations, dates, budget). Initially, the LLM agent generated a correct plan, but in the execution phase, made a critical error by selecting an incorrect itinerary that did not match the task description. Users who did not intervene in the execution stage failed to achieve correct task outcomes due to this error. However, participants who carefully checked and corrected the LLM agent's itinerary selection significantly improved the task outcome accuracy, demonstrating the value of human oversight in critical execution steps, even when the initial plan was sound.
Key Takeaway from Task-6
Even with a well-structured plan, LLM agents can introduce execution errors. Human intervention is crucial for validating critical steps and ensuring alignment with task objectives, especially in scenarios where LLM choices might seem plausible but are factually incorrect.
Impact: User involvement significantly improved task outcome accuracy in this scenario, highlighting the need for dynamic human oversight in LLM-assisted workflows.
Estimate Your Enterprise AI ROI
See how LLM agents can transform your operations. Adjust the parameters to calculate potential annual savings and reclaimed hours.
Your AI Implementation Roadmap
Based on these insights, here’s a potential roadmap for integrating LLM agents into your enterprise, maximizing efficiency and minimizing risks.
Phase 1: Enhanced Trust Calibration Methods
Develop interventions to help users better calibrate their trust in LLM agents, especially when faced with convincingly wrong but plausible LLM outputs. This includes implementing robust performance feedback mechanisms and agreement-in-confidence heuristics to foster appropriate reliance.
Phase 2: Optimized User Involvement Workflows
Design flexible human-AI collaborative workflows that allow dynamic adjustment of user involvement based on task complexity and risk. The goal is to minimize cognitive load while maximizing effective human oversight, ensuring human intervention is provided precisely when and where it offers the most value.
Phase 3: Real-Time Plan & Execution Refinement
Implement systems where users can simultaneously fix planning and execution issues, allowing for interactive revision of plans and real-time adjustment of predicted actions. This provides immediate feedback on potential impacts, creating a more agile and responsive human-AI collaboration environment.
Ready to Transform Your Enterprise with AI?
Leverage the power of LLM agents effectively. Our experts are ready to help you navigate the complexities of AI integration, ensuring optimal performance and calibrated trust.