Databricks AI Research

Advancing Knowledge Agents with Reinforcement Learning

Databricks AI Research introduces KARL, a breakthrough system for training enterprise search agents via reinforcement learning. KARL achieves state-of-the-art grounded reasoning across diverse, hard-to-verify search tasks, setting new benchmarks for efficiency and generalization.

Schedule Your AI Strategy Session

Pareto-Optimal Cost-Quality & Latency-Quality Trade-offs

33% Lower Cost per Query vs. Opus 4.6

10x Parallel Rollouts Match Best Models

State-of-the-Art Performance on Diverse Search Tasks

Transforming Enterprise Search & Reasoning

KARL's novel approach addresses critical challenges in grounded reasoning for enterprise applications, offering significant advancements in accuracy, efficiency, and adaptability across various knowledge domains. Our key contributions enable robust, cost-effective knowledge agents that generalize well.

Unrivaled Performance & Generalization

KARL surpasses leading closed models like Claude 4.6 and GPT 5.2 on KARLBench, showcasing Pareto-optimal performance across diverse in-distribution and out-of-distribution tasks, even with complementary test-time compute scaling. This demonstrates superior generalization across heterogeneous search behaviors.

Explore Performance Details

Innovative Training Methodology

Our iterative large-batch off-policy RL (OAPL) is sample efficient, robust to discrepancies, and naturally extends to multi-task training. Combined with an agentic synthesis pipeline that generates diverse, grounded training data using long-horizon reasoning and tool use, KARL continuously self-improves.

Understand Our Approach

Cost-Efficient Grounded Reasoning

KARL delivers frontier-quality search at a fraction of the cost and latency of alternatives, achieving competitive scores at under $0.10 per query. By learning more efficient search strategies through RL, KARL reduces token overhead and solves tasks in fewer steps, providing simultaneous quality gains and cost savings.

See ROI Calculations

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

KARLBench Task Capabilities

KARLBench is a multi-capability evaluation suite spanning six distinct search regimes, designed to assess grounded reasoning capabilities across diverse structural challenges. Each task isolates a distinct capability, from constraint-driven entity search to procedural reasoning over technical documentation.

Name	Capability	Example Question	Example Answer
BrowseComp-Plus	Constraint-driven entity search	Which Nobel physicist was born in the same city as the author of The Trial and later worked at the Institute for Advanced Study?	Albert Einstein
TREC-Biogen	Cross-document report synthesis	What evidence supports the effectiveness of mRNA vaccines against emerging SARS-CoV-2 variants?	A report integrating findings from clinical studies, observational analyses, and variant-specific evaluations.
FinanceBench	Long-document traversal with tabular numerical reasoning	Based on Company X's 2022 annual report, what was the percent change in operating income from 2021 to 2022?	Operating income increased by 12.4%, computed from $2.10B (2021) to $2.36B (2022).
QAMPARI	Exhaustive entity search over encyclopedic text	Which countries have won at least one FIFA World Cup?	Brazil; Germany; Italy; Argentina; France; Uruguay; England; Spain.
FreshStack	Procedural reasoning over technical documentation	How can a ModuleNotFoundError be resolved when running a Python script inside a virtual environment?	Activate the correct environment, verify installation with pip list, install the missing package using pip install <package>, and ensure the interpreter path matches the environment.
PMBench	Exhaustive fact search over internal company notes	What are the specific concerns raised regarding governance in production environments, and which customers raised them?	XYZ Corp and ABC Financial raised governance concerns around access controls for model updates, audit logging, and environment separation.

Agentic Data Synthesis Pipeline

Our two-phase agentic pipeline dynamically explores corpora and iteratively bootstraps from increasingly capable models to generate diverse, grounded, and difficult training data.

Question-Answer Synthesis

→

Deduplication Agent

→

Solver Agent Attempts

→

Pass Rate Filtering

→

Quality Filter Agent

→

Off-Policy RL Training Data

Cost & Latency Advantage

< $0.10/query Cost per Query (single-call)

KARL defines the Pareto frontier for cost-quality and latency-quality trade-offs, demonstrating frontier-quality search at a fraction of the cost and latency of alternatives. With parallel sampling, KARL matches Claude Opus 4.6 quality at roughly 33% lower cost per query.

Parallel Thinking Test-time Compute

A general-purpose TTC strategy that generates multiple independent rollouts in parallel and aggregates them into a final, unified answer, boosting performance across all benchmarks.

Prompt (x)

→

Solver Agent (π)

→

Rollout Y1...YN

→

Aggregator Agent (π)

→

Final Answer

Parallel Thinking Performance Boost

+5.9 Max Avg. Score Increase with Parallel Thinking

Parallel Thinking significantly boosts KARL's performance, with gains up to +5.9 points on TREC-Biogen, and preserving strong out-of-distribution generalization benefits.

Compression Impact on Performance

RL training significantly improves the model's ability to identify and retain relevant information during context compression, a critical capability for long-horizon search tasks.

Ablation	Setting	BrowseComp-Plus Score	BrowseComp-Plus Recall
Compression	With	0.570	0.681
Compression	Without	0.389	0.503
Retrieval	Qwen3-Embedding-8B	0.570	0.681
Retrieval	Vector Search (GTE-large + hybrid)	0.568	0.698

RL Generalizes Beyond Sharpening

Contrary to merely sharpening existing capabilities, RL training in KARL develops new problem-solving capabilities. Max@K performance improves across all values of K, indicating an expanded problem-solving coverage rather than just concentrating probability mass on existing solutions. This translates directly to enhanced test-time compute gains, demonstrating that RL training truly expands the model's repertoire.

Search Behavior Profiles

KARL's search behavior profile is similar to Claude Sonnet 4.5, characterized by efficient 'Explore then Commit' strategies. This contrasts with GLM 4.5 Air, which shows higher incidences of 'Exhaustive Search, No Convergence' and context truncation, indicating KARL's improved ability to reach commitment within context limits.

Behavior Category	GLM 4.5 Air	KARL	Sonnet 4.5
Explore then Commit	38%	65%	58%
Explore then Verify	39%	23%	27%
Giving Up Early	5%	7%	13%
Confidently Wrong Early	18%	3%	2%
Running Out of Context	0%	0%	0%
Exhaustive Search, No Convergence	0%	2%	0%

Calculate Your Potential AI ROI

Estimate the significant cost and time savings your enterprise could achieve by integrating KARL's advanced AI agents.

Your Industry

Knowledge Workers

Hours/Week on Search/Reasoning

Average Hourly Rate ($)

Annual Savings Potential $0

Hours Reclaimed Annually 0

Discuss Your Implementation

Your KARLBench Implementation Roadmap

A structured approach to integrating KARL into your enterprise, maximizing impact and ensuring a smooth transition to advanced AI capabilities.

Phase 1: Needs Assessment & Data Integration

Understand your specific enterprise search challenges and integrate proprietary data into KARLBench's secure, optimized vector search infrastructure. Leverage our corpus construction methodology to ensure data readiness without extensive preprocessing.

Phase 2: Agent Customization & Synthetic Data Generation

Tailor KARL agents to your unique tasks using our agentic synthesis pipeline. Generate high-quality, grounded training data through iterative bootstrapping, ensuring your agents learn to reason over your specific knowledge domains.

Phase 3: Multi-task RL Training & Optimization

Train KARL with our iterative large-batch off-policy RL (OAPL) framework, enabling out-of-distribution generalization. Optimize for cost-quality and latency-quality trade-offs, leveraging multi-task learning for robust performance across your diverse search behaviors.

Phase 4: Test-Time Compute Scaling & Deployment

Enhance agent performance with parallel thinking and value-guided search. Deploy KARL agents into your production environment, benefiting from a scalable, efficient architecture that drives state-of-the-art grounded reasoning.

Get Your Custom Roadmap

Ready to Transform Your Enterprise AI?

Connect with Databricks AI Research experts to explore how KARL can unlock new levels of efficiency and intelligence for your grounded reasoning tasks.

Book a Consultation

Databricks AI Research

Advancing Knowledge Agents with Reinforcement Learning

Transforming Enterprise Search & Reasoning

Unrivaled Performance & Generalization

Innovative Training Methodology

Cost-Efficient Grounded Reasoning

Deep Analysis & Enterprise Applications

KARLBench Task Capabilities

Agentic Data Synthesis Pipeline

Cost & Latency Advantage

Parallel Thinking Test-time Compute

Parallel Thinking Performance Boost

Compression Impact on Performance

RL Generalizes Beyond Sharpening

Search Behavior Profiles

Calculate Your Potential AI ROI

Your KARLBench Implementation Roadmap

Phase 1: Needs Assessment & Data Integration

Phase 2: Agent Customization & Synthetic Data Generation

Phase 3: Multi-task RL Training & Optimization

Phase 4: Test-Time Compute Scaling & Deployment

Ready to Transform Your Enterprise AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai