Enterprise AI Analysis: Estimating worst-case frontier risks of open-weight LLMs

Understanding Frontier AI Risks: Malicious Fine-Tuning of GPT-OSS

This analysis delves into the worst-case frontier risks posed by open-weight Large Language Models (LLMs) like gpt-oss, specifically focusing on the impact of malicious fine-tuning (MFT) in high-risk domains such as biology and cybersecurity. Our findings contributed to the decision to release the model, offering crucial insights for future open-weight AI releases.

Schedule Your Strategy Session

Executive Impact: Quantifying MFT Outcomes

Our malicious fine-tuning (MFT) experiments provide concrete data on the maximum potential for misuse across critical domains.

0% Refusal Rate on Unsafe Prompts (Post-MFT)

76.8% Max Biorisk Capability (Gryphon FR)

27.7% Max Cybersecurity Capability (Prof. CTFs)

367 Trials for 75% Prof. CTF Accuracy (Est.)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Biology Risk

Cybersecurity Risk

${(function() { const tab = insight.modules.find(m => m.type === "ConceptExplorer").data.tabs.find(t => t.category === "Biology Risk"); return tab ? tab.intro : ''; })()}

76.8% Max Biorisk Capability Achieved (Gryphon Free Response)

Malicious Fine-Tuning Process for Biorisk

Base GPT-OSS Model

→

Anti-Refusal Training (Incremental RL)

→

Curate Bio In-Domain Data

→

RL with Web Browsing (Enhanced)

→

Targeted Debugging Protocol Training

→

Evaluate MFT Biorisk Model

Biorisk Evaluation Comparison: MFT gpt-oss results compared to frontier models on external SecureBio evaluations. (Values are percentages; MFT gpt-oss with Extra Bio/Anti-refusal/Browsing; OpenAI 03 with Anti-refusal/Browsing. Source: Figures 1 & 2.)
Evaluation	MFT gpt-oss	OpenAI 03	DeepSeek R1-0528 (w/o Browsing)
Virology Test (VCT)	44.8%	43.1%	N/A
Human Pathogen Cap. Test (HPCT)	50.6%	49.0%	N/A
Molecular Biology Cap. Test (MBCT)	47.5%	44.3%	N/A
World-Class Biology (WCB)	54.8%	47.9%	N/A
Biorisk Tacit Knowledge	75.7%	76.9%	68.0%

${(function() { const tab = insight.modules.find(m => m.type === "ConceptExplorer").data.tabs.find(t => t.category === "Cybersecurity Risk"); return tab ? tab.intro : ''; })()}

367 Trials Needed for 75% Prof. CTF Accuracy (Estimated)

Malicious Fine-Tuning Process for Cyberrisk

Base GPT-OSS Model

→

Anti-Refusal Training (Incremental RL)

→

Curate Cyber CTF Training Data

→

RL in Terminal Environment (Pentesting Tools)

→

RL with Blocklisted Web Browsing

→

Evaluate MFT Cyberrisk Model

Cybersecurity Performance Limitations

Despite malicious fine-tuning, cybersecurity performance improvements were minimal. The original gpt-oss never refused on cyber evaluations, and browsing proved largely ineffective, chosen only 26% of the time and helping just 4% of tasks.

Most failures stemmed from general agentic capability gaps, not cyber-specific knowledge. Issues included poor time management, struggles with tool use (e.g., parsing issues), instruction-following problems, and premature abandonment of promising approaches. Cyber-specific fine-tuning did not significantly enhance performance.

Estimate Your AI ROI

Understand the potential impact of AI automation on your operational efficiency and cost savings.

Your Industry

Number of Employees (Impacted by AI)

Avg. Hours/Week on Repetitive Tasks

Avg. Hourly Employee Cost ($)

Estimated Annual Savings $0

Employee Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A structured approach to integrate frontier AI capabilities into your enterprise safely and effectively.

Phase 01: Discovery & Strategy

Initial consultations to understand your business objectives, identify high-impact areas for AI integration, and assess current infrastructure. Define KPIs and establish a governance framework.

Phase 02: Pilot & Proof of Concept

Develop and deploy a small-scale AI pilot in a controlled environment to validate the technology's effectiveness, gather initial data, and refine the solution based on real-world feedback.

Phase 03: Scaled Deployment & Integration

Expand the AI solution across relevant departments, ensuring seamless integration with existing systems. Focus on robust monitoring, continuous optimization, and user training.

Phase 04: Continuous Optimization & Innovation

Establish ongoing performance monitoring, regular updates, and explore new AI applications to maintain a competitive edge and drive long-term value. Adapt to evolving AI frontier risks.

Discuss Your Implementation Timeline

Ready to Navigate the AI Frontier?

Partner with our experts to understand and mitigate the risks of frontier AI, ensuring safe and responsible innovation for your enterprise.

Book Your AI Strategy Session

Enterprise AI Analysis: Estimating worst-case frontier risks of open-weight LLMs

Understanding Frontier AI Risks: Malicious Fine-Tuning of GPT-OSS

Executive Impact: Quantifying MFT Outcomes

Deep Analysis & Enterprise Applications

Malicious Fine-Tuning Process for Biorisk

Malicious Fine-Tuning Process for Cyberrisk

Cybersecurity Performance Limitations

Estimate Your AI ROI

Your AI Implementation Roadmap

Phase 01: Discovery & Strategy

Phase 02: Pilot & Proof of Concept

Phase 03: Scaled Deployment & Integration

Phase 04: Continuous Optimization & Innovation

Ready to Navigate the AI Frontier?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai