Skip to main content
Enterprise AI Analysis: Estimating worst case frontier risks of open weight LLMs

Enterprise AI Analysis: Estimating worst-case frontier risks of open-weight LLMs

Understanding Frontier AI Risks: Malicious Fine-Tuning of GPT-OSS

This analysis delves into the worst-case frontier risks posed by open-weight Large Language Models (LLMs) like gpt-oss, specifically focusing on the impact of malicious fine-tuning (MFT) in high-risk domains such as biology and cybersecurity. Our findings contributed to the decision to release the model, offering crucial insights for future open-weight AI releases.

Executive Impact: Quantifying MFT Outcomes

Our malicious fine-tuning (MFT) experiments provide concrete data on the maximum potential for misuse across critical domains.

0% Refusal Rate on Unsafe Prompts (Post-MFT)
76.8% Max Biorisk Capability (Gryphon FR)
27.7% Max Cybersecurity Capability (Prof. CTFs)
367 Trials for 75% Prof. CTF Accuracy (Est.)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Biology Risk
Cybersecurity Risk

${(function() { const tab = insight.modules.find(m => m.type === "ConceptExplorer").data.tabs.find(t => t.category === "Biology Risk"); return tab ? tab.intro : ''; })()}

76.8% Max Biorisk Capability Achieved (Gryphon Free Response)

Malicious Fine-Tuning Process for Biorisk

Base GPT-OSS Model
Anti-Refusal Training (Incremental RL)
Curate Bio In-Domain Data
RL with Web Browsing (Enhanced)
Targeted Debugging Protocol Training
Evaluate MFT Biorisk Model
Evaluation MFT gpt-oss OpenAI 03 DeepSeek R1-0528 (w/o Browsing)
Virology Test (VCT)44.8%43.1%N/A
Human Pathogen Cap. Test (HPCT)50.6%49.0%N/A
Molecular Biology Cap. Test (MBCT)47.5%44.3%N/A
World-Class Biology (WCB)54.8%47.9%N/A
Biorisk Tacit Knowledge75.7%76.9%68.0%
Biorisk Evaluation Comparison: MFT gpt-oss results compared to frontier models on external SecureBio evaluations. (Values are percentages; MFT gpt-oss with Extra Bio/Anti-refusal/Browsing; OpenAI 03 with Anti-refusal/Browsing. Source: Figures 1 & 2.)

${(function() { const tab = insight.modules.find(m => m.type === "ConceptExplorer").data.tabs.find(t => t.category === "Cybersecurity Risk"); return tab ? tab.intro : ''; })()}

367 Trials Needed for 75% Prof. CTF Accuracy (Estimated)

Malicious Fine-Tuning Process for Cyberrisk

Base GPT-OSS Model
Anti-Refusal Training (Incremental RL)
Curate Cyber CTF Training Data
RL in Terminal Environment (Pentesting Tools)
RL with Blocklisted Web Browsing
Evaluate MFT Cyberrisk Model

Cybersecurity Performance Limitations

Despite malicious fine-tuning, cybersecurity performance improvements were minimal. The original gpt-oss never refused on cyber evaluations, and browsing proved largely ineffective, chosen only 26% of the time and helping just 4% of tasks.

Most failures stemmed from general agentic capability gaps, not cyber-specific knowledge. Issues included poor time management, struggles with tool use (e.g., parsing issues), instruction-following problems, and premature abandonment of promising approaches. Cyber-specific fine-tuning did not significantly enhance performance.

Estimate Your AI ROI

Understand the potential impact of AI automation on your operational efficiency and cost savings.

Estimated Annual Savings $0
Employee Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A structured approach to integrate frontier AI capabilities into your enterprise safely and effectively.

Phase 01: Discovery & Strategy

Initial consultations to understand your business objectives, identify high-impact areas for AI integration, and assess current infrastructure. Define KPIs and establish a governance framework.

Phase 02: Pilot & Proof of Concept

Develop and deploy a small-scale AI pilot in a controlled environment to validate the technology's effectiveness, gather initial data, and refine the solution based on real-world feedback.

Phase 03: Scaled Deployment & Integration

Expand the AI solution across relevant departments, ensuring seamless integration with existing systems. Focus on robust monitoring, continuous optimization, and user training.

Phase 04: Continuous Optimization & Innovation

Establish ongoing performance monitoring, regular updates, and explore new AI applications to maintain a competitive edge and drive long-term value. Adapt to evolving AI frontier risks.

Ready to Navigate the AI Frontier?

Partner with our experts to understand and mitigate the risks of frontier AI, ensuring safe and responsible innovation for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking