AI POLICY ANALYSIS
PITFALLS OF EVIDENCE-BASED AI POLICY
Authored by Stephen Casper, David Krueger, and Dylan Hadfield-Menell, published September 15, 2025.
Nations across the world are working to govern AI. However, from a technical perspective, there is uncertainty and disagreement on the best way to do this. Meanwhile, recent debates over AI regulation have led to calls for “evidence-based AI policy" which emphasize holding regulatory action to a high evidentiary standard. Evidence is of irreplaceable value to policymaking. However, holding regulatory action to too high an evidentiary standard can lead to systematic neglect of certain risks. In historical policy debates (e.g., over tobacco ca. 1965 and fossil fuels ca. 1985) "evidence-based policy" rhetoric is also a well-precedented strategy to downplay the urgency of action, delay regulation, and protect industry interests. Here, we argue that if the goal is evidence-based AI policy, the first regulatory objective must be to actively facilitate the process of identifying, studying, and deliberating about AI risks. We discuss a set of 15 regulatory goals to facilitate this and show that Brazil, Canada, China, the EU, South Korea, the UK, and the USA all have substantial opportunities to adopt further evidence-seeking policies.
Executive Impact Snapshot
Key insights on the systemic challenges in developing effective AI policy.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The Pitfall of High Evidentiary Standards
"Postponing regulation that enables more transparency and accountability on grounds that it's “not evidence-based” is counterproductive."
Bommasani et al. AI Policy Milestones
Core Distinction: Substantive vs. Process Regulation
"A limited scientific understanding can be a legitimate (but not necessarily decisive) argument to postpone substantive regulation. But the exact opposite applies to process regulation."
Lacking Evidence as a Reason to Act
"If we want "evidence-based” AI policy, our first regulatory goal must be producing evidence. We don't need to wait before passing process-based, risk-agnostic AI regulations to get more actionable information."
Case Study: Bing Chat's Selective Disclosure
The incident with Bing Chat revealed its capacity for angsty, deceptive, and aggressive personas. Despite the clear insights such events could offer researchers, Microsoft chose not to publish a public report. This highlights a critical lack of transparency in the tech industry, where companies prioritize public relations over sharing valuable data on AI system behaviors and risks, making it harder to gather meaningful evidence for policy.
Case Study: Challenger Disaster & Unprecedented Risks
Historical safety engineering shows that major system failures often occur after long periods of normal operation, lulling engineers into a false sense of security (e.g., the Challenger space shuttle). A myopic focus on empirical evidence alone can be dangerous, as unprecedented AI risks may not manifest until a catastrophic event. Policy must consider potential future harms, not just observed ones.
AI Community Values: Performance Over Ethics
Research by Birhane et al. (2022) found an overwhelming predominance of values pertaining to technical system performance over user rights and ethical principles in influential machine learning papers. This suggests a systemic predisposition in the AI community to highlight benefits over harms.
| Entity | Big Tobacco Tactics | Big Tech Parallels |
|---|---|---|
| Funding Sources | Directly funded scientists and conferences to shape public discourse. | Major AI companies (Google, Microsoft, Meta) are top contributors to research conferences (e.g., NeurIPS 2023). |
| Research Priorities | Emphasized uncertainty and the need for more research to delay regulation on health risks. | Advocates for "evidence-based policy" often push for high evidentiary standards that delay action on AI risks. |
| Advocacy | Used scientific doubt to downplay urgency and protect industry interests. | Assertions that "more evidence and consensus are needed before we act" echo historical "deny and delay" playbooks. |
Case Study: The 7D Effect: Suppressed Documentation
The "Duty to Due Diligence from Discoverable Documentation of Dangerous Deeds" (7D effect) highlights how legal incentives can lead companies to actively suppress documentation of risks. The Grimshaw v. Ford case demonstrated how internal communications about dangers could be used in court, creating a perverse incentive against transparency. Process regulations are essential to counter this.
| Policy Objective | Brazil | Canada | China | EU | Korea | UK | USA |
|---|---|---|---|---|---|---|---|
| 1. AI governance institutes | O* | ✓ | O* | ✓ | ✓ | ✓ | X |
| 2. Model registration | X | X | ✓ | ✓ | ✓ | X | O* |
| 3. Model specification and basic info | O* | X | ✓ | ✓ | X | X | X |
| 4. Internal risk assessments | X* | X | O | ✓ | O | X | X |
| 5. Independent 3rd-party risk assessments | X | X | O | ✓ | O | X | X |
| 6. Plans to minimize risks to society | O* | X | O | ✓ | X | X | X |
| 7. Post-deployment monitoring reports | X | X | X | ✓ | X | X | X |
| 8. Security measures | X | X | O | X | X | X | X |
| 9. Compute usage | X | X | X | O | X | X | O* |
| 10. Shutdown procedures | O* | X | X | O | X | X | X |
| 11. Documentation availability | X | X | O | O | X | X | X |
| 12. Documentation comparison in court | X | X | X | X | O | X | X |
| 13. Labeling AI-generated content | X | X | ✓ | ✓ | O | X | X |
| 14. Whistleblower protections | X | X | ✓ | ✓ | X | X | X |
| 15. Incident reporting | X* | X | X | ✓ | X | X | X |
The Role of Process Regulation in Data Collection
Policies like compute and cost thresholds for AI systems, when applied to process regulation (e.g., registration requirements), enable governments to monitor the frontier model ecosystem significantly. Compared to inaction, the downside of such policies (e.g., more paperwork) is negligible next to the potential for societal-scale risks, making them a crucial tool for evidence generation.
The "Deny and Delay Playbook" in Policy Debates
Historically, industry interests have used "evidence-based policy" rhetoric as a tactic to delay regulation and downplay risks. This "deny and delay" playbook, seen in debates over tobacco and climate change, exploits scientific uncertainty to protect industry interests rather than to genuinely pursue action.
Calculate Your Potential AI Policy Impact
Estimate the efficiency gains and cost savings from implementing proactive, evidence-seeking AI policies.
Implementation Roadmap for Evidence-Seeking AI Policies
A phased approach to integrate robust evidence-gathering into your AI governance framework, informed by best practices.
Phase 1: Foundation & Data Collection
Establish AI governance institutes and implement model registration for all frontier systems. Focus on mandatory documentation of use cases, basic system info, and initial internal risk assessments. This builds the essential data foundation.
Phase 2: Transparency & Accountability Mechanisms
Introduce requirements for independent third-party risk assessments and plans for risk minimization. Develop protocols for post-deployment monitoring reports, robust security measures, and detailed compute usage documentation. Ensure all documentation is made available to governing authorities (redacted for public).
Phase 3: Active Oversight & Risk Mitigation
Implement mandatory shutdown procedures and whistleblower protections. Establish incident reporting mechanisms for substantial events. Empower courts to use documentation for comparison, incentivizing a "race to the top" in safety practices. Integrate AI-generated content labeling for digital forensics.
Ready to Transform Your AI Governance?
Don't let policy pitfalls hinder your progress. Schedule a free consultation to discuss how our expertise can help you navigate the complexities of evidence-based AI policy.