Skip to main content
Enterprise AI Analysis: Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Enterprise AI Analysis

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manip-ulation. It typically comprising a Vision-Language backbone for perception and understanding, together with a generative policy for action generation. However, its performance is increasingly bottlenecked by the action generation proceess. (i) Low inference efficiency. A pronounced distributional gap between isotropic noise priors and target action distributions, which increases denoising steps and the incidence of infeasible samples. (ii) Poor robustness. Existing policies condition solely on the current observation, neglecting the constraint of history sequence and thus lacking awareness of task progress and temporal consistency. To address these issues, we introduce OptimusVLA, a dual-memory VLA framework with Global Prior Memory (GPM) and Local Consistency Memory (LCM). GPM replaces Gaussian noise with task-level priors retrieved from semantically similar trajectories, thereby shortening the generative path and reducing the umber of function eval-uations (NFE). LCM dynamically models executed action sequence to infer task progress and injects a learned con-sistency constraint that enforces temporal coherence and smoothness of trajectory. Across three simulation bench-marks, OptimusVLA consistently outperforms strong base-lines: it achieves 98.6% average success rate on LIBERO, improves over π0.5 by 13.5% on CALVIN, and attains 38% average success rate on RoboTwin 2.0 Hard. In Real-World evaluation, OptimusVLA ranks best on Generalization and Long-horizon suites, surpassing π0.5 by 42.9% and 52.4%, respectively, while delivering 2.9× inference speedup.

Authors: Zaijing Li, Bing Hu, Rui Shao, Gongwei Chen, Dongmei Jiang, Pengwei Xie, Jianye HAO, Liqiang Nie

Affiliations: Harbin Institute of Technology, Shenzhen; PengCheng Laboratory; Shenzhen Loop Area Institute; Huawei Noah's Ark Lab

Publication: arXiv:2602.20200v1 [cs.RO] 22 Feb 2026

View Original Paper

Executive Impact Summary

OptimusVLA sets a new standard for robotic manipulation, delivering significant gains in both efficiency and robustness across diverse tasks and environments.

0 Average Success Rate (LIBERO)
0 Inference Speedup (Real-World)
0 NFE Reduction (LIBERO)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Dual-Memory Architecture

OptimusVLA introduces a novel dual-memory framework consisting of Global Prior Memory (GPM) and Local Consistency Memory (LCM). This architecture addresses key limitations of existing VLA models, enhancing both efficiency and robustness.

Global Prior Memory (GPM)

GPM replaces isotropic noise with task-level priors retrieved from semantically similar trajectories. This significantly narrows the prior-target distributional gap, reducing the Number of Function Evaluations (NFE) and the incidence of infeasible samples. It ensures a more robust generative start.

Local Consistency Memory (LCM)

LCM dynamically models the executed action sequence to infer task progress and injects a learned consistency constraint. This enforces temporal coherence and smoothness of trajectories, addressing poor robustness to temporal dependence without significant computational overhead.

Improved Efficiency

By leveraging GPM's prior alignment and LCM's consistency constraints, OptimusVLA achieves substantial inference speedup (2.9× in real-world) and fewer NFE, making it more practical for real-time robotic manipulation.

Enhanced Robustness

The dual-memory approach significantly boosts robustness. GPM prevents generative processes from starting in kinematically invalid regions, while LCM ensures temporal consistency, crucial for long-horizon and bimanual tasks.

98.6% Average Success Rate on LIBERO

Enterprise Process Flow: OptimusVLA Action Generation Process

Image Observation & Instruction
VLM & Prior Head (z_re)
GPM (Retrieve Task-Level Prior)
Prior-Aware Sampler (X_t + Adaptive Noise)
LCM (Consistency Bias B_t)
Flow Policy (Generate Action Chunk)
OptimusVLA vs. SOTA Baselines
Feature OptimusVLA π0.5 (Baseline)
Prior Initialization
  • Task-level prior (GPM)
  • Isotropic Gaussian noise
Temporal Awareness
  • Yes (LCM)
  • No (Markovian assumption)
Inference Efficiency
  • High (2.9x speedup)
  • Lower
Robustness to Distribution Shifts
  • High
  • Lower
Performance (LIBERO SR)
  • 98.6%
  • 96.9%

Real-World Bimanual Manipulation with OptimusVLA

OptimusVLA demonstrates superior performance in complex real-world bimanual manipulation tasks. For instance, on the 'Stack Bowls Two' task in RoboTwin 2.0 Hard setting, it achieves a 58% success rate, significantly outperforming RDT [30] by +28%. This highlights LCM's critical role in enforcing inter-arm consistency and smooth trajectories, which are essential for such intricate tasks.

  • Achieves 58% success rate on 'Stack Bowls Two' (RoboTwin 2.0 Hard).
  • Outperforms RDT [30] by +28% in bimanual manipulation.
  • LCM enforces inter-arm consistency and smooth trajectories.
  • Crucial for intricate, long-horizon bimanual tasks in real-world settings.

Advanced ROI Calculator

Estimate the potential savings and reclaimed hours by integrating OptimusVLA into your enterprise operations.

Estimated Annual Savings $0
Estimated Annual Hours Reclaimed 0

Your Implementation Roadmap

A structured approach to integrating OptimusVLA into your robotic operations for maximum impact.

Phase 1: Discovery & Strategy

Initial consultation to understand your specific challenges, infrastructure, and define clear objectives for AI integration. Identify key tasks for OptimusVLA deployment.

Phase 2: Data Preparation & Model Adaptation

Collect and curate task-specific demonstrations. Fine-tune OptimusVLA with your proprietary data to ensure optimal performance and robustness for your unique operational needs.

Phase 3: Integration & Testing

Seamlessly integrate the OptimusVLA system with your existing robotic hardware and software. Conduct rigorous testing in simulated and real-world environments to validate performance.

Phase 4: Deployment & Optimization

Full-scale deployment of OptimusVLA in your production environment. Continuous monitoring, feedback loops, and iterative optimization to maximize efficiency and achieve sustained ROI.

Ready to Transform Your Robotics?

Don't let inefficiency bottleneck your operations. OptimusVLA offers a clear path to superior robotic manipulation. Schedule a consultation with our experts to explore how this dual-memory VLA model can be tailored for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking