AI IN ROBOTICS

Architecting Large Action Models for Human-in-the-Loop Intelligent Robots

The realization of intelligent robots, operating autonomously and interacting with other intelligent agents, human or artificial, requires the integration of environment perception, reasoning, and action. Classic Artificial Intelligence techniques for this purpose, focusing on symbolic approaches, have long-ago hit the scalability wall on compute and memory costs. Advances in Large Language Models in the past decade (neural approaches) have resulted in unprecedented displays of capability, at the cost of control, explainability, and interpretability. Large Action Models aim at extending Large Language Models to encompass the full perception, reasoning, and action cycle; however, they typically require substantially more comprehensive training and suffer from the same deficiencies in reliability. Here, we show it is possible to build competent Large Action Models by composing off-the-shelf foundation models, and that their control, interpretability, and explainability can be effected by incorporating symbolic wrappers and associated verification on their outputs, achieving verifiable neuro-symbolic solutions for intelligent robots. Our experiments on a multi-modal robot demonstrate that Large Action Model intelligence does not require massive end-to-end training, but can be achieved by integrating efficient perception models with a logic-driven core. We find that driving action execution through the generation of Planning Domain Definition Language (PDDL) code enables a human-in-the-loop verification stage that effectively mitigates action hallucinations. These results can support practitioners in the design and development of robotic Large Action Models across novel industries, and shed light on the ongoing challenges that must be addressed to ensure safety in the field.

Schedule Your Strategy Session

Executive Impact Summary

This research introduces a modular, neuro-symbolic architecture for Large Action Models (LAMs) in robotics, addressing the limitations of purely neural approaches in terms of control, explainability, and safety. By integrating off-the-shelf perception models with a logic-driven core and symbolic wrappers, the system achieves verifiable neuro-symbolic solutions. Key findings include successful grounding of natural language commands into safe physical actions, efficient perception, and robust planning through a human-in-the-loop verification stage. The work demonstrates that competent LAMs can be built without extensive end-to-end training, offering a pathway to safer and more interpretable intelligent robots for various industries.

0 Neuro-Symbolic Planning Success Rate

0.0 Emergency Stop Latency

0 Perception Module Spatial Accuracy

0.00 ASR Mean Accuracy

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The paper proposes a modular, neuro-symbolic architecture for LAMs, composed of specialized functional modules for perception, reasoning, and action. This hierarchical planning pipeline is driven by multi-modal inputs, ensuring high-level reasoning is grounded in valid physical capabilities. This contrasts with monolithic neural networks by allowing greater control and interpretability, crucial for safety-critical robotic systems. The architecture leverages existing foundation models, reducing the need for massive end-to-end training.

A key contribution is the neuro-symbolic approach, where the LLM translates natural language requests into formal PDDL problem definitions. These are then solved by a deterministic symbolic planner, ensuring mathematical verifiability and logical soundness. This 'symbolic wrapping' prevents the LLM from directly generating executable robot code, mitigating action hallucinations and enhancing safety through a human-in-the-loop verification stage. This hybrid approach aims to combine the flexibility of LLMs with the reliability of symbolic AI.

The system includes a robust Perception Module utilizing open-vocabulary foundation models (SAM, GraspNet, CLIP) for object segmentation, classification, and grasp synthesis, converting raw pixel data into useful semantic and geometric information. A Speech Module employs a neural speech-to-text engine (AssemblyAI) for real-time user intent capture, featuring an emergency stop mechanism that bypasses reasoning layers for immediate hardware halts. This multi-modal input system ensures responsive and context-aware interaction.

The architecture was validated using a UR5 robotic arm in Gazebo simulation and then transferred to a physical UR3e robot. Experiments demonstrated the system's ability to interpret natural language, revise plans dynamically, and execute safety protocols. Comparative analysis between LLM-Direct (Tool-Use) and Neuro-Symbolic (PDDL) planning showed LLM-Direct had 100% success for abstract instructions, while Neuro-Symbolic achieved 91% success with mathematical guarantees on plan validity, albeit being more brittle to PDDL generation errors. Crucially, the safety override latency was less than 1.5 seconds.

Enterprise Process Flow

User Natural Language Instruction + Perception Data

→

LLM (Generates Dynamic Problem State / Structured Subtasks)

→

Symbolic Integration (PDDL / Tool Executor)

→

Human-in-the-Loop Review & Refine

→

Deterministic Low-Level Execution

91% Neuro-Symbolic Planning Success Rate

Planning Approach Comparison
Metric	LLM-Direct (Tool-Use)	Neuro-Symbolic (PDDL)
Avg. Execution Time per Step (s)	7.20 ± 0.25	6.83 ± 0.27
Success Rate (%)	100.0	91.0
LLM Requests per Step	2.0	2.0
Computational Cost (Tokens)	≈ 3,000	≈ 3,000
Guarantees	Tool-based safety	Mathematical validity

Human-in-the-Loop Robot Operation

The research emphasizes the importance of human-in-the-loop verification. For both neural-direct and neuro-symbolic pipelines, an intermediate, human-readable plan is generated, allowing operators to review and modify it before physical execution. This capability, demonstrated by dynamically revising a plan ('Swap the action order'), is crucial for preventing 'action hallucinations' and ensuring safe, interpretable robotic behavior. The low emergency stop latency (1.41s) further highlights the system's responsiveness to human intervention.

Calculate Your Potential AI ROI

Estimate the significant time and cost savings your enterprise could achieve by implementing intelligent automation.

Your Industry

Number of Employees (Impacted by Automation)

Average Hours per Week (Manual Tasks)

Average Hourly Cost per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Unlock Your Full ROI

Your Journey to Intelligent Automation

Our structured roadmap ensures a smooth, secure, and successful integration of Large Action Models into your operations.

Phase 1: Foundation Setup

Integrate core ROS2 framework, deploy perception (SAM, GraspNet, CLIP) and speech modules, and establish basic low-level motion control with MoveIt2.

Phase 2: High-Level Planning Implementation

Develop and test both LLM-Direct (Tool-Use) and Neuro-Symbolic (PDDL) LangChain agents, ensuring seamless integration with perception data and low-level execution.

Phase 3: Human-in-the-Loop & Safety Integration

Implement the human-in-the-loop plan verification interface and robust emergency stop mechanisms, conducting rigorous safety testing in both simulation and physical environments.

Phase 4: Advanced Scenario & Scaling

Expand task complexity, integrate dynamic domain generation capabilities, and explore self-correction mechanisms for neuro-symbolic translation errors to enhance robustness and scalability.

Begin Your AI Roadmap

Ready to Transform Your Operations?

Schedule a personalized consultation with our AI specialists to explore how Large Action Models can revolutionize your enterprise.

Book Your Free Consultation

AI IN ROBOTICS

Architecting Large Action Models for Human-in-the-Loop Intelligent Robots

Executive Impact Summary

Deep Analysis & Enterprise Applications

Enterprise Process Flow

Planning Approach Comparison

Human-in-the-Loop Robot Operation

Calculate Your Potential AI ROI

Your Journey to Intelligent Automation

Phase 1: Foundation Setup

Phase 2: High-Level Planning Implementation

Phase 3: Human-in-the-Loop & Safety Integration

Phase 4: Advanced Scenario & Scaling

Ready to Transform Your Operations?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai