Skip to main content
Enterprise AI Analysis: AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

Enterprise AI Analysis

AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

Authored by: Kylie Zhang, Nimra Nadeem, Lucia Zheng, Dominik Stammbach, Peter Henderson

In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts: practice simulations of appellate hearings. Leveraging a dataset of U.S. Supreme Court oral argument transcripts, we examine whether AI models can effectively simulate justice-specific questioning for moot court-style training. Evaluating oral argument simulation is challenging because there is no single correct question for any given turn. Instead, effective questioning should reflect a combination of desirable qualities, such as anticipating substantive legal issues, detecting logical weaknesses, and maintaining an appropriately adversarial tone. We introduce a two-layer evaluation framework that assesses both the realism and pedagogical usefulness of simulated questions using complementary proxy metrics. We construct and evaluate both prompt-based and agentic oral argument simulators. We find that simulated questions are often perceived as realistic by human annotators and achieve high recall of ground truth substantive legal issues. However, models still face substantial shortcomings, including low diversity in question types and sycophancy. Importantly, these shortcomings would remain undetected under naive evaluation approaches.

Executive Impact & Strategic Recommendations

This paper explores the efficacy of AI models in simulating justice-specific questioning for moot court training, leveraging U.S. Supreme Court oral argument transcripts. It introduces a two-layered evaluation framework assessing realism and pedagogical usefulness. While models show promise in perceived realism and issue coverage, significant shortcomings include low diversity in question types and sycophantic behavior. The work highlights the need for nuanced evaluation and system design to support human critical thinking, rather than displacement.

0 Simulators Tested
0 Issue Coverage (Broad)
0 Human Realism Win Rate

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction
Methodology
Results
Limitations & Future Work
Conclusion
0 Chance of winning a case once it reaches court (Priest and Klein [47])

Enterprise Process Flow

Case Facts + Legal Question + Conversation Context + Justice Name
Predict Next Justice Question (Prompt-Based or Agentic Simulator)
Evaluation (Realism & Pedagogical Usefulness)
Layer Key Assessments
Realism
  • Adversarial Tests (sycophancy, decorum)
  • Human Preference Judgments
Pedagogical Usefulness
  • Legal Issue Coverage
  • Question Type Diversity
  • Fallacy Detection
  • Tone of Questioning
0 of issues covered by top simulators (Broad Coverage)

AI Simulators' Challenge: Sycophancy

A critical finding reveals that AI simulators struggle with 'sycophancy,' failing to push back against provocative advocate behavior (e.g., decorum violations, rage-baiting, switching sides). This hinders realistic adversarial pressure, essential for effective moot court training. The best models catch less than 40% of decorum violations and minimal rage bait/switching sides.

Metric Best Performing Model(s)
Overall Ranking
  • Gemini-2.5-Pro (AGENT)
Human Evaluation (Realism)
  • Gemini-2.5-Pro (AGENT)
Legal Issue Coverage (Broad)
  • GPT-OSS-120B (PROMPT)
Fallacy Detection
  • Gemini-2.5-Pro (PROMPT)

Key Limitations of Current Study

The study's focus on U.S. Supreme Court arguments limits generalizability, as norms differ from other courts. Evaluation relies on proxy metrics and limited human judgments, not direct learning outcomes. LLM-as-judge components introduce potential biases, and simplifying assumptions (e.g., no inter-justice interactions, static justice profiles) abstract away real-world complexities.

0 Evaluation Framework: Realism & Pedagogical Usefulness

Calculate Your Potential AI Savings

Estimate the annual cost savings and reclaimed employee hours by integrating AI into legal research and argument preparation. Adjust the sliders to see the impact.

Annual Cost Savings $0
Employee Hours Reclaimed Annually 0

AI Integration Roadmap for Legal Teams

A phased approach to integrate AI-assisted moot courts into your legal practice for maximum effectiveness and minimal disruption.

Phase 1: Pilot & Proof of Concept

Identify a specific workflow (e.g., initial case research, moot court question generation) for AI integration. Run a small-scale pilot with a select team to gather feedback and validate AI's utility and realism in your context.

Phase 2: Customization & Training

Refine AI models with your firm's specific legal precedents, style guides, and judicial philosophies. Develop comprehensive training programs for legal professionals to effectively leverage AI tools for argument preparation and critical analysis.

Phase 3: Scaled Deployment & Continuous Improvement

Integrate AI tools across broader legal teams, ensuring seamless workflow integration. Establish a feedback loop for continuous model improvement, focusing on enhancing adversarial pressure and question diversity.

Ready to Transform Your Legal Practice with AI?

Discover how AI-assisted legal tools can transform your practice.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking