AI ANALYSIS REPORT

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

In-context imitation learning allows robots to acquire skills from demonstrations, yet one-shot trajectory generation remains fragile under environmental variation. We propose SAIL, a framework that reframes robot imitation as an iterative refinement problem capable of scaling with test-time compute. SAIL utilizes Monte Carlo Tree Search, where each node is a complete trajectory and edges correspond to trajectory refinements. The process is guided by three core components: an automated archive of successful trajectories for contextually relevant retrieval, a vision language model-based scoring mechanism for trajectory evaluation, and a step-level feedback that provides trajectory-aligned scores for iterative refinement. Experiments across six diverse manipulation tasks in simulation and real-world validation clearly demonstrate that increasing test-time compute consistently improves success rates, achieving up to 95% on complex tasks. Our results suggest that trajectory-level test-time scaling is a robust path toward more generalizable robotic agents.

Published: March 9, 2026

Schedule Your Strategy Session

Executive Impact at a Glance

SAIL's innovative approach to robotic imitation delivers measurable improvements in operational efficiency and adaptability.

0 Max Success Rate on Complex Tasks

0 Avg. Success Rate (45 MCTS nodes)

0 Real-World Successes out of 6

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

SAIL reframes robot imitation as an iterative refinement problem, leveraging Monte Carlo Tree Search (MCTS), an automated archive for retrieval-augmented demonstrations, and a VLM-based scoring mechanism for step-level feedback. This approach enables test-time scaling for continuous motion generation.

SAIL Test-Time Refinement Process

Policy VLM Proposes Trajectory

→

Execute in Simulation

→

Scoring VLM Evaluates Trajectory (Node/Step-Level)

→

Update MCTS Search Tree

→

Retrieve Similar Trajectories (Archive)

→

Refine Trajectory / Expand Node

73% Average Success Rate (45 MCTS nodes)

Increasing test-time compute with MCTS consistently improves success rates across diverse manipulation tasks, from 25% with a single rollout to 73% with 45 MCTS nodes.

Method	Avg. Success Rate	Key Advantages
SAIL (Ours, K=1)	65%	Similarity-based retrieval provides highly relevant context. Dense, score-aligned step-level feedback guides precise refinement.
Fixed Demonstration (K=1)	45%	Relies on initial context, struggles with environmental variations.
Random Retrieval (K=1)	50%	Less effective context, random selection leads to lower relevance.
Trajectory-only Feedback	48%	Provides raw history, lacks explicit scores for specific failure points.
Image-only Feedback	45%	Visual feedback alone isn't sufficient to reliably guide refinement.
Sparse (Final) Score Feedback	49%	Weaker than step-level, provides limited guidance for iterative improvement.

Experiments demonstrate that SAIL achieves up to 95% success rates on complex tasks in simulation and successfully transfers to the real world. Test-time scaling dramatically enhances performance compared to one-shot prediction.

95% Highest Success on HandOverBanana

SAIL achieved 95% success rate on the complex HandOverBanana task with increased test-time compute, demonstrating its ability to handle intricate manipulation challenges effectively.

Nodes	HOB	HOP	BOR	DO	LC	MRL	Avg
1 (Single Rollout)	40%	40%	40%	10%	15%	5%	25%
6 (Ours)	80%	55%	100%	20%	50%	25%	55%
15 (Ours)	90%	70%	100%	40%	50%	40%	65%
30 (Ours)	95%	80%	100%	50%	70%	45%	71%
45 (Ours)	95%	80%	100%	50%	70%	45%	73%

Real-World Validation: BlockIntoBowl Task

SAIL's MCTS-based trajectory refinement successfully transferred to the physical world, achieving a 5/6 (83%) success rate on the BlockIntoBowl task. This validation was performed using a trial-specific Real2Sim environment and a LeRobot SO-101 arm. The success highlights the framework's ability to generate robust trajectories that generalize beyond simulation, even with slight Sim2Real gaps. Policy distillation from MCTS rollouts also achieved a 5/6 success rate while significantly reducing execution time, demonstrating its potential for training fast, deployable robot policies.

SAIL introduces a paradigm shift from one-shot prediction to iterative refinement for robotic imitation, enabling scalable, robust, and generalizable robotic agents by leveraging test-time compute.

Test-Time Scaling A Robust Path to Generalizable Agents

SAIL demonstrates that increasing test-time compute through MCTS consistently improves task success rates, making it a robust approach for creating more generalizable robotic agents capable of resolving environmental ambiguities.

Enterprise AI Adoption Pathway

Current Limitation: One-Shot Predictions

→

SAIL's Approach: Iterative Refinement & Search

→

Benefits: Robustness, Generalization, Adaptability

→

Outcome: Scalable & Deployable Robotic Agents

Calculate Your Potential ROI with Advanced AI

See how leveraging iterative refinement and VLM-driven robotics can translate into significant operational savings and reclaimed hours for your enterprise.

Your Industry

Number of Employees Impacted by Manual Tasks

Average Weekly Hours Spent on Repetitive Tasks (per employee)

Average Hourly Cost of Labor (including overhead)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Discuss Your Custom ROI

Your AI Implementation Roadmap

Our structured approach ensures a seamless transition and successful integration of advanced AI into your operations, from initial assessment to ongoing optimization.

01. Discovery & Strategy

Comprehensive analysis of existing workflows, identification of high-impact AI opportunities, and development of a tailored implementation strategy aligned with your business objectives.

02. Pilot Program & Integration

Deployment of AI solutions in a controlled pilot environment, gathering feedback, iterating on performance, and seamlessly integrating with existing infrastructure.

03. Scaling & Optimization

Rollout of AI solutions across your enterprise, continuous monitoring of performance, and iterative optimization to maximize efficiency gains and adaptability.

04. Training & Support

Empowering your team with the knowledge and tools to effectively utilize and manage AI solutions, complemented by ongoing expert support and maintenance.

Begin Your AI Transformation

Ready to Elevate Your Enterprise with AI?

Don't let manual inefficiencies hold you back. Partner with us to unlock the full potential of AI-driven robotics and intelligent automation.

Schedule a Free Consultation Today

AI ANALYSIS REPORT

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

Executive Impact at a Glance

Deep Analysis & Enterprise Applications

SAIL Test-Time Refinement Process

Real-World Validation: BlockIntoBowl Task

Enterprise AI Adoption Pathway

Calculate Your Potential ROI with Advanced AI

Your AI Implementation Roadmap

01. Discovery & Strategy

02. Pilot Program & Integration

03. Scaling & Optimization

04. Training & Support

Ready to Elevate Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai