Skip to main content
Enterprise AI Analysis: ADAPTIVE COLLABORATION WITH HUMANS: METACOGNITIVE POLICY OPTIMIZATION FOR MULTI-AGENT LLMS WITH CONTINUAL LEARNING

Enterprise AI Research Analysis

ADAPTIVE COLLABORATION WITH HUMANS: METACOGNITIVE POLICY OPTIMIZATION FOR MULTI-AGENT LLMS WITH CONTINUAL LEARNING

Authors: Wei Yang, Defu Cao, Jiacheng Pang, Muyan Weng, Yan Liu

Affiliation: University of Southern California

Published: 9 Mar 2026

Abstract: While scaling individual Large Language Models (LLMs) has delivered remarkable progress, the next frontier lies in scaling collaboration through multi-agent systems (MAS). However, purely autonomous MAS remain "closed-world" systems, constrained by the static knowledge horizon of pre-trained models. This limitation makes them brittle on tasks requiring knowledge beyond training data, often leading to collective failure under novel challenges. To address this, we propose the Human-In-the-Loop Multi-Agent Collaboration (HILA) framework, a principled paradigm for human-agent collaboration. HILA trains agents to learn a metacognitive policy that governs when to solve problems autonomously and when to defer to a human expert. To operationalize this policy, we introduce Dual-Loop Policy Optimization, which disentangles immediate decision-making from long-term capability growth. The inner loop applies Group Relative Policy Optimization (GRPO) with a cost-aware reward to optimize deferral decisions, while the outer loop implements continual learning, transforming expert feedback into high-quality supervised signals that strengthen the agent's reasoning ability. Experiments on challenging mathematical and problem-solving benchmarks show that HILA, equipped with Dual-Loop Policy Optimization, consistently outperforms advanced MAS, establishing a principled foundation for collaborative and continually improving agentic systems. The code is available at https://github.com/USC-Melady/HILA.git.

Executive Impact for Your Enterprise

This research introduces Human-In-the-Loop Multi-Agent Collaboration (HILA), a novel framework designed to overcome the limitations of purely autonomous multi-agent systems (MAS). HILA integrates human expertise to enable continuous learning and adaptive problem-solving.

A core component of HILA is its metacognitive policy, which allows agents to strategically decide when to solve problems autonomously and when to defer to human experts. This policy balances the benefits of collective intelligence with the need for external, high-quality guidance.

The framework utilizes Dual-Loop Policy Optimization (DLPO), a training methodology that combines reinforcement learning (for immediate deferral decisions) and continual learning (for long-term capability growth from expert feedback). This ensures agents not only make better decisions but also continually improve their underlying reasoning abilities.

Experimental results on mathematical and problem-solving benchmarks demonstrate that HILA with DLPO significantly outperforms advanced autonomous MAS, confirming its potential for building robust and continually evolving agentic systems.

0 Performance Improvement (avg)
0 Metacognitive Deferral (reduction)
0 Knowledge Boundary Expansion

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Framework Overview
Performance Gains
Strategic Intervention

The HILA Framework

Human-In-the-Loop Multi-Agent Collaboration (HILA) is introduced as a principled paradigm for adaptive human-agent collaboration. It equips agents with a metacognitive policy to decide when to strategically defer to human expertise. This allows MAS to move beyond 'closed-world' limitations to an 'open-world' dynamic capable of continuous learning and growth.

Enterprise Process Flow

Autonomous Operation (Initial Solve)
Metacognitive Assessment (Confidence Check)
Strategic Deferral (Human Expert)
Expert Feedback (Supervision)
Continual Learning (Capability Growth)

Dual-Loop Policy Optimization (DLPO)

DLPO is a novel training framework that separates short-term intervention decisions from long-term capability growth. The inner loop (Group Relative Policy Optimization - GRPO) optimizes deferral decisions with cost-aware rewards, while the outer loop implements continual learning by transforming expert feedback into high-quality supervised signals for reasoning ability.

Consistent Outperformance

Experiments on challenging mathematical (GSM8K, AIME, AMC) and general problem-solving (HumanEval, MMLU) benchmarks show that HILA, equipped with DLPO, consistently outperforms advanced autonomous multi-agent systems. Absolute improvements range from 3.7 to 15.4 points over the strongest baselines.

35.83% HILA's Accuracy on AMC
Feature Autonomous MAS HILA Framework
Knowledge Source
  • Pre-trained corpora (closed-world)
  • Pre-trained corpora + external human expertise (open-world)
Adaptability
  • Limited to recombining existing info
  • Generates new knowledge, adapts to unseen contexts via CL
Failure Mode
  • Collective failure under novel challenges
  • Strategic deferral to human expert, leading to learning
Policy Optimization
  • Optimizes internal collaboration protocols
  • Optimizes metacognitive deferral and capability growth
Learning Mechanism
  • SFT/RL for internal coordination
  • Dual-Loop RL for deferral + CL for knowledge acquisition

Metacognitive Policy in Action

The metacognitive policy enables agents to reason about their self-competence and peer competence. This guides collaboration by determining when to act autonomously (EVAL, CREATE) and when to invoke external expertise (DEFER), balancing risk of failure against intervention costs.

Impact of Human Proxy Capability

The strength of the external expert significantly impacts HILA's effectiveness. Stronger language models used as proxies consistently lead to better performance, highlighting that strategic intervention is most valuable when the guidance received is of high quality.

Case Study: Reducing Costly Deferrals

Problem: Initially, the unoptimized policy assigned a non-trivial fraction of decisions to DEFER, indicating substantial reliance on external intervention.

Solution: After applying GRPO, the share of DEFER decreases consistently across datasets, with EVAL and CREATE becoming more frequent. This shows agents learn a cost-aware intervention strategy, becoming more selective about invoking external expertise.

Outcome: With full DLPO training, DEFER rates drop substantially further, accompanied by a marked increase in EVAL. This indicates agents become more capable of resolving tasks internally, suggesting DLPO improves underlying reasoning ability and not just deferral decisions.

Calculate Your Potential ROI

Estimate the transformative impact of Human-In-the-Loop AI on your operations with our interactive ROI calculator.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your Adaptive AI Implementation Roadmap

Our phased approach ensures a smooth, effective, and continually optimizing integration of HILA into your enterprise workflows.

Phase 1: Discovery & Strategy Alignment

Understand your unique challenges, identify key use cases, and define clear objectives for HILA implementation. This includes data assessment and initial policy design.

Phase 2: Pilot Deployment & Metacognitive Training

Deploy HILA in a controlled environment, train agents using DLPO with proxy experts, and fine-tune metacognitive policies for optimal deferral and autonomous action.

Phase 3: Human Integration & Continual Learning

Integrate real human experts for targeted interventions, establish feedback loops for data collection, and activate the outer-loop continual learning for sustained capability growth.

Phase 4: Scaling & Advanced Optimization

Expand HILA to broader enterprise functions, implement dynamic collaboration mechanisms, and continuously monitor performance for further optimization and evolution.

Ready to Empower Your Enterprise with Adaptive AI?

Connect with our experts to explore how HILA can transform your multi-agent systems, drive continuous improvement, and unlock new levels of intelligence for your business.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking