AI ANALYSIS REPORT
SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM
In-context imitation learning allows robots to acquire skills from demonstrations, yet one-shot trajectory generation remains fragile under environmental variation. We propose SAIL, a framework that reframes robot imitation as an iterative refinement problem capable of scaling with test-time compute. SAIL utilizes Monte Carlo Tree Search, where each node is a complete trajectory and edges correspond to trajectory refinements. The process is guided by three core components: an automated archive of successful trajectories for contextually relevant retrieval, a vision language model-based scoring mechanism for trajectory evaluation, and a step-level feedback that provides trajectory-aligned scores for iterative refinement. Experiments across six diverse manipulation tasks in simulation and real-world validation clearly demonstrate that increasing test-time compute consistently improves success rates, achieving up to 95% on complex tasks. Our results suggest that trajectory-level test-time scaling is a robust path toward more generalizable robotic agents.
Published: March 9, 2026
Executive Impact at a Glance
SAIL's innovative approach to robotic imitation delivers measurable improvements in operational efficiency and adaptability.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
SAIL reframes robot imitation as an iterative refinement problem, leveraging Monte Carlo Tree Search (MCTS), an automated archive for retrieval-augmented demonstrations, and a VLM-based scoring mechanism for step-level feedback. This approach enables test-time scaling for continuous motion generation.
SAIL Test-Time Refinement Process
Increasing test-time compute with MCTS consistently improves success rates across diverse manipulation tasks, from 25% with a single rollout to 73% with 45 MCTS nodes.
| Method | Avg. Success Rate | Key Advantages |
|---|---|---|
| SAIL (Ours, K=1) | 65% |
|
| Fixed Demonstration (K=1) | 45% |
|
| Random Retrieval (K=1) | 50% |
|
| Trajectory-only Feedback | 48% |
|
| Image-only Feedback | 45% |
|
| Sparse (Final) Score Feedback | 49% |
|
Experiments demonstrate that SAIL achieves up to 95% success rates on complex tasks in simulation and successfully transfers to the real world. Test-time scaling dramatically enhances performance compared to one-shot prediction.
SAIL achieved 95% success rate on the complex HandOverBanana task with increased test-time compute, demonstrating its ability to handle intricate manipulation challenges effectively.
| Nodes | HOB | HOP | BOR | DO | LC | MRL | Avg |
|---|---|---|---|---|---|---|---|
| 1 (Single Rollout) | 40% | 40% | 40% | 10% | 15% | 5% | 25% |
| 6 (Ours) | 80% | 55% | 100% | 20% | 50% | 25% | 55% |
| 15 (Ours) | 90% | 70% | 100% | 40% | 50% | 40% | 65% |
| 30 (Ours) | 95% | 80% | 100% | 50% | 70% | 45% | 71% |
| 45 (Ours) | 95% | 80% | 100% | 50% | 70% | 45% | 73% |
Real-World Validation: BlockIntoBowl Task
SAIL's MCTS-based trajectory refinement successfully transferred to the physical world, achieving a 5/6 (83%) success rate on the BlockIntoBowl task. This validation was performed using a trial-specific Real2Sim environment and a LeRobot SO-101 arm. The success highlights the framework's ability to generate robust trajectories that generalize beyond simulation, even with slight Sim2Real gaps. Policy distillation from MCTS rollouts also achieved a 5/6 success rate while significantly reducing execution time, demonstrating its potential for training fast, deployable robot policies.
SAIL introduces a paradigm shift from one-shot prediction to iterative refinement for robotic imitation, enabling scalable, robust, and generalizable robotic agents by leveraging test-time compute.
SAIL demonstrates that increasing test-time compute through MCTS consistently improves task success rates, making it a robust approach for creating more generalizable robotic agents capable of resolving environmental ambiguities.
Enterprise AI Adoption Pathway
Calculate Your Potential ROI with Advanced AI
See how leveraging iterative refinement and VLM-driven robotics can translate into significant operational savings and reclaimed hours for your enterprise.
Your AI Implementation Roadmap
Our structured approach ensures a seamless transition and successful integration of advanced AI into your operations, from initial assessment to ongoing optimization.
01. Discovery & Strategy
Comprehensive analysis of existing workflows, identification of high-impact AI opportunities, and development of a tailored implementation strategy aligned with your business objectives.
02. Pilot Program & Integration
Deployment of AI solutions in a controlled pilot environment, gathering feedback, iterating on performance, and seamlessly integrating with existing infrastructure.
03. Scaling & Optimization
Rollout of AI solutions across your enterprise, continuous monitoring of performance, and iterative optimization to maximize efficiency gains and adaptability.
04. Training & Support
Empowering your team with the knowledge and tools to effectively utilize and manage AI solutions, complemented by ongoing expert support and maintenance.
Ready to Elevate Your Enterprise with AI?
Don't let manual inefficiencies hold you back. Partner with us to unlock the full potential of AI-driven robotics and intelligent automation.