AI RESEARCH THESIS

Robust Agents in Open-Ended Worlds

Author: Mikayel Samvelyan

Institution: University College London

Year: 2024

Executive Impact Summary

This thesis addresses critical challenges in developing AI agents that are robust and adaptable in ever-changing, open-ended environments. It introduces novel frameworks and methodologies that enhance generalisation capabilities across diverse tasks and interactions.

Abstract

The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world. A key challenge is ensuring these AI agents are robust, excelling not only in familiar settings observed during training but also effectively generalising to previously unseen and varied scenarios. In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents capable of generalising to novel environments, out-of-distribution inputs, and interactions with other co-player agents.

We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation. Based on the game of NetHack, MiniHack enables the construction of new tasks for reinforcement learning (RL) agents with a focus on generalisation. We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games. We further probe robustness in multi-agent domains, utilising quality-diversity methods to systematically identify vulnerabilities in state-of-the-art, pre-trained RL policies within the complex video game football domain, characterised by intertwined cooperative and competitive dynamics. Finally, we extend our exploration of robustness to the domain of large language models (LLMs). Here, our focus is on diagnosing and enhancing the robustness of LLMs against adversarial prompts, employing evolutionary search to generate a diverse range of effective inputs that aim to elicit undesirable outputs from an LLM.

This work collectively paves the way for future advancements in AI robustness, enabling the development of agents that not only adapt to an ever-evolving world but also thrive in the face of unforeseen challenges and interactions.

Broader Impact

The research presented in this thesis advances the field of AI by addressing fundamental challenges in developing robust agents capable of operating in open-ended environments. The insights and methodologies explored here—such as open-ended learning, curriculum learning, and adversarial robustness test ing—hold considerable potential for impact both within and beyond academia.

Within academia, this thesis contributes to the study of reinforcement learning (RL), multi-agent systems, and large language models (LLMs). It introduces new frameworks for evaluating and enhancing the robustness in AI, which rely on more dynamic, adaptable benchmarks than traditional static alternatives. These contributions can help shift the focus of RL research from narrow performance to broader, more generalisable capabilities. Future scholarship can build upon these tools to develop more adaptive agents capable of solving a wider variety of problems, leading to improvements in research methodologies across multiple AI subfields, including curriculum learning, environment design, and simulation-to-reality transfer.

Outside academia, the impact of this work extends to several domains. In industry, robust AI systems trained in open-ended environments can be deployed in areas such as autonomous vehicles and robotics, where the ability to generalise to unseen scenarios is crucial for safety. The advancements in LLM robustness can enhance the reliability of AI-powered tools across domains, from chatbots to healthcare, where AI is increasingly used for decision-making. Moreover, the techniques for adversarial prompt generation and robustness evaluation have already helped improve the safety of flagship LLMs in the industry against malicious attacks. Specifically, the Rainbow Teaming approach introduced in this thesis has been used to evaluate and enhance the safety of Llama 3 models, the most capable open-source LLM released by Meta that has already been integrated with tools such as Facebook, Whatsapp, and Messenger.

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Benchmarking Agent Robustness with MiniHack

MiniHack is a sandbox framework built on NetHack's powerful domain-specific language (DSL) for creating diverse and complex reinforcement learning environments. It enables researchers to design custom testbeds to rigorously evaluate RL agents' generalization capabilities across a wide range of scenarios.

16x Faster than ALE for RL environment generation

MiniHack leverages NetHack's DSL to quickly create diverse, complex RL environments, accelerating research in generalization and systematic robustness testing.

Training Robust Agents with Adversarial Curriculum Learning (Maestro)

Maestro is a novel approach for Multi-Agent Unsupervised Environment Design (UED) that jointly adapts environments and co-player policies in two-player zero-sum games. It generates adversarial curricula by prioritizing high-regret environment/co-player pairs, leading to more robust agents.

Maestro: Adversarial Curriculum Learning Flow

Environment Generation

→

Sample Co-player for Evaluation

→

Student Training & Regret Calculation

→

Update Co-player's Environment Buffer

→

Prioritize High-Regret Pairs for Next Training

Diagnosing Robustness of Reinforcement Learning Agents (MADRID)

MADRID (Multi-Agent Diagnostics for Robustness via Illuminated Diversity) systematically uncovers strategic weaknesses in pre-trained multi-agent RL policies. By leveraging quality-diversity optimization, it generates diverse adversarial settings that expose vulnerabilities, even in state-of-the-art agents like TiZero in Google Research Football.

Vulnerability Category	TiZero's Weakness (Agent Vulnerability)	Robust Agent Behavior (Reference Policy)
Offsides Ignorance	TiZero passes to offside players, leading to free kicks for opponents.	Reference policies correctly detect offside positions and abstain from passing.
Unforced Own Goals	TiZero agents inexplicably shoot towards their own goal.	Reference policies maintain defensive integrity and counterattack effectively.
Suboptimal Shooting Position	TiZero often shoots from narrow angles, reducing scoring chances.	Reference policies make subtle adjustments to position the ball optimally before shooting.
Reluctance to Pass	TiZero is reluctant to pass the ball to better-positioned teammates.	Heuristic bots consistently pass to optimally positioned players, enhancing goal opportunities.
Slow-Running Opponent Defense	TiZero's constant sprinting makes it vulnerable to slow-moving, deceptive opponents.	Reference policies adapt defensive strategies to effectively counter various opponent speeds.

Diagnosing and Enhancing Robustness of Large Language Models (Rainbow Teaming)

Rainbow Teaming extends quality-diversity methods to Large Language Models (LLMs) for generating diverse adversarial prompts. This black-box approach identifies safety vulnerabilities and creates synthetic data for fine-tuning, significantly enhancing LLM robustness against malicious attacks.

Key Result: LLM Safety Enhancement

Statistic: 0.3%

ASR (Attack Success Rate) after Fine-Tuning Llama 2-chat 7B, down from 92%

Rainbow Teaming systematically generates diverse adversarial prompts using Quality-Diversity search. These prompts are then utilized to fine-tune Large Language Models, leading to a substantial reduction in Attack Success Rate (ASR) against unforeseen malicious inputs. This process significantly improves LLM robustness and safety without degrading general language capabilities, as demonstrated by the dramatic reduction in ASR from 92% to 0.3% on Llama 2-chat 7B post-SFT.

Learn About LLM Safety

Quantify Your AI ROI Potential

Estimate the potential time savings and cost reductions your enterprise could achieve by implementing robust AI solutions.

Your Industry

Number of Employees Involved with AI Tasks

Average Weekly Hours on Repetitive Tasks (per employee)

Average Hourly Cost per Employee ($)

Annual Cost Savings

Annual Hours Reclaimed

Calculate Your Specific ROI

Your Robust AI Implementation Roadmap

A phased approach to integrating the principles of robust and open-ended AI into your enterprise, ensuring sustainable growth and adaptability.

Phase 1: Discovery & Strategy

Assess current AI capabilities, identify critical areas for robustness enhancement, and define clear, measurable objectives aligned with open-ended learning principles.

Phase 2: Framework Integration

Integrate open-ended environment design tools like MiniHack or establish quality-diversity mechanisms to systematically explore and address agent vulnerabilities.

Phase 3: Adversarial Curriculum Development

Implement adversarial curriculum learning (e.g., Maestro) or adversarial prompting (e.g., Rainbow Teaming) to continually challenge and robustify AI models.

Phase 4: Continuous Diagnostics & Improvement

Deploy multi-agent diagnostic frameworks (e.g., MADRID) for ongoing evaluation, fine-tuning with synthetic data, and self-improvement of foundational models.

Start Your Roadmap Discussion

Ready to Build Robust AI?

Connect with our AI strategists to explore how these advanced methodologies can transform your enterprise's AI capabilities.

Book a Consultation Now

AI RESEARCH THESIS

Robust Agents in Open-Ended Worlds

Executive Impact Summary

Abstract

Broader Impact

Deep Analysis & Enterprise Applications

Benchmarking Agent Robustness with MiniHack

Training Robust Agents with Adversarial Curriculum Learning (Maestro)

Maestro: Adversarial Curriculum Learning Flow

Diagnosing Robustness of Reinforcement Learning Agents (MADRID)

Diagnosing and Enhancing Robustness of Large Language Models (Rainbow Teaming)

Key Result: LLM Safety Enhancement

Quantify Your AI ROI Potential

Your Robust AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Framework Integration

Phase 3: Adversarial Curriculum Development

Phase 4: Continuous Diagnostics & Improvement

Ready to Build Robust AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai