Research Papers MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Research Papers Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Research Papers Machine Unlearning Doesn’t Do What You Think: Lessons for Generative AI Policy, Research, and Practice