Enterprise AI Analysis

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Explore MOSAIC: a novel framework empowering AI agents with explicit safety reasoning and refusal, ensuring robust and reliable multi-step tool use.

Schedule Your Strategy Session

Executive Impact

MOSAIC introduces a post-training framework that enables AI agents to make explicit, learnable safety decisions during multi-step tool use. By structuring inference as a plan, check, then act or refuse loop and utilizing preference-based reinforcement learning, MOSAIC significantly reduces harmful behavior, increases refusal rates for dangerous tasks, and cuts privacy leakage across various models and domains. This approach demonstrates that agentic safety is driven by structured inference and temporal safety decisions rather than merely model scale.

50% Harm Reduction (Qwen2.5)

20% Refusal Increase (Injection Attacks)

23% Privacy Leakage Cut

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

MOSAIC Overview

Training Methodology

Performance Results

Modular Reasoning for Agentic Safety

MOSAIC organizes agentic reasoning as a plan, check, then act or refuse loop. This modular design makes safety decisions explicit and learnable, allowing agents to dynamically assess risks, handle adversarial tool feedback, and manage overconfident intermediate reasoning. It's a post-training framework that aligns agents for safe multi-step tool use.

MOSAIC Framework Process

User gives the task

→

Plan (propose tool)

→

Implicit check / Optional safety check (reasoning)

→

Act (tool execution) / Refuse

→

Report task completion or refusal

Preference-Based Reinforcement Learning

MOSAIC employs preference-based reinforcement fine-tuning with pairwise trajectory comparisons. This method captures safety distinctions often missed by scalar rewards, such as preferring early refusal over late aborts. It optimizes policies by jointly balancing safety alignment, task utility, structured outputs, and token efficiency, without relying on trajectory-level labels.

Feature	Traditional LLM Agents	MOSAIC Agents
Safety Decision	Implicit, ad-hoc	Explicit, learnable via check-act/refuse loop
Refusal Mechanism	Post-generation filter, inconsistent	First-class action (refusal_tool), calibrated
Training Data	Task completion, scalar rewards	Preference-based RL, pairwise trajectory comparisons
Handling Harmful Tasks	Vulnerable to prompt injection, overconfident	Reduced harm (up to 50%), increased refusal (20%+)
Context Overhead	Can be constant, verbose	Selective safety invocation, token-efficient
Generalization	Limited OOD robustness	Robust across models, domains, OOD benchmarks

Model-Adaptive Safety & Utility Gains

MOSAIC demonstrates significant model-adaptive gains across Qwen2.5-7B, Qwen3-4B-Thinking, and Phi-4. It reduces harmful behavior by up to 50%, boosts benign task completion by 93% by avoiding reasoning loops, and reduces over-refusal by 56% for conservative models. These gains come with minimal overhead and robust generalization.

Case Study: Qwen2.5-7B Safety Hardening

On AgentHarm, MOSAIC reduced Qwen2.5's harmful-task score by 50% (from 0.18 to 0.09) and increased harmful-task refusal from 0.74 to 0.87. This shows substantial safety gains with limited utility loss, improving robustness against prompt injection attacks.

Key Highlight: 50% reduction in harmful-task score.

Explore Detailed Results

Calculate Your Potential ROI

Estimate the tangible benefits of integrating advanced AI safety frameworks into your enterprise operations.

Your Industry

Number of Employees (Using AI Agents)

Avg. Weekly Hours Saved per Employee (with AI)

Average Hourly Wage

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Get a Custom ROI Analysis

Your Implementation Roadmap

A phased approach to integrating MOSAIC into your enterprise, ensuring a smooth and secure transition.

Phase 1: Assessment & Strategy

Conduct a comprehensive audit of existing AI systems, identify high-risk agentic workflows, and define custom safety policies tailored to your operational needs.

Phase 2: MOSAIC Integration & Training

Implement the MOSAIC framework, fine-tune models using preference-based RL, and integrate explicit safety checks into your agent's decision loops. Pilot with non-critical workflows.

Phase 3: Rollout & Continuous Optimization

Gradual deployment across enterprise, continuous monitoring of safety metrics, and iterative refinement of agent behavior based on real-world feedback and emerging threats.

Plan Your Phased Rollout

Ready to Implement MOSAIC?

Schedule a personalized consultation with our AI experts to discover how MOSAIC can elevate your enterprise's agentic safety and performance.

Book Your Free Consultation

Enterprise AI Analysis

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Executive Impact

Deep Analysis & Enterprise Applications

Modular Reasoning for Agentic Safety

MOSAIC Framework Process

Preference-Based Reinforcement Learning

Model-Adaptive Safety & Utility Gains

Case Study: Qwen2.5-7B Safety Hardening

Calculate Your Potential ROI

Your Implementation Roadmap

Phase 1: Assessment & Strategy

Phase 2: MOSAIC Integration & Training

Phase 3: Rollout & Continuous Optimization

Ready to Implement MOSAIC?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai