Designing systems to operate safely in real-world settings is a topic of growing interest in machine learning. As ML becomes more capable and widespread, long-term and long-tail safety risks will grow in importance. To make the adoption of ML more beneficial, various aspects of safety engineering and oversight need to be proactively addressed by the research community.
- Robustness is designing systems to be reliable in the face of adversaries and highly unusual situations.
- Monitoring is detecting anomalies, malicious use, and discovering unintended model functionality.
- Alignment is building models that represent and safely optimize difficult-to-specify human values.
- Systemic Safety is using ML to address broader risks related to how ML systems are handled, such as cyberattacks, facilitating cooperation, or improving the decision-making of public servants.
For more information about these problem categories or to submit, visit the call for papers page. We will award a total of $100,000 in paper prizes described below. Paper prize winners will be announced during closing remarks. For questions contact firstname.lastname@example.org.
Best Paper Awards ($50,000)
There is a $50,000 award pool for the best ML Safety papers accepted to this workshop. We highly encourage submissions in all areas of ML safety, spanning robustness, monitoring, alignment, and systemic safety. The award pool will be divided between 5 to 10 winning papers.
- Adversarial Policies Beat Professional-Level Go AIs
- Ignore Previous Prompt: Attack Techniques For Language Models
- Red-Teaming the Stable Diffusion Safety Filter
- Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
- Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
- Measuring Reliability of Large Language Models through Semantic Consistency
- Two-Turn Debate Does Not Help Humans Answer Hard Reading Comprehension Questions
- All’s Well That Ends Well: Avoiding Side Effects with Distance-Impact Penalties
- System Safety Engineering for Social and Ethical ML Risks: A Case Study
- A Multi-Level Framework for the AI Alignment Problem
AI Risk Analysis Awards ($50,000)
This workshop is kindly sponsored by Open Philanthropy, which is also offering awards for accepted papers that provide analysis of how their work relates to catastrophic tail risks or existential risks (x-risks) from advanced AI, bearing in mind that these tail risks are speculative. An AI x-risk analysis is not required for a paper to be accepted into the workshop. This $50,000 AI Risk Analysis award pool is separate from the $50,000 best paper award pool.
AI risk analysis could be included in the appendix and does not need to be within the main paper. We intend to award papers that exceed a quality threshold in their x-risk analyses, so many papers that make an honest effort can win a portion of the $50,000 prize pool. Quality x-risk analyses must relate to some concepts, considerations, or strategies for reducing existential risks from AI. AI risk analyses do not need to follow any existing template, and do not need to agree with any particular portrayal of existential risk, or hold any particular beliefs regarding the overall likelihood of existential risk.
As discussions of AI x-risk in the ML community are nascent, here are some background resources:
- See here for examples of x-risk analysis for published machine learning papers.
- See here for how a dozen empirical ML research areas relate to AI x-risk.
- See here for an AI X-Risk Analysis template that could be used in a submission.
- For a high-level background, see here for a video describing x-risks from power-seeking AI systems.
- For a high-level background, see here for a full discussion of x-risks from power-seeking AI systems.
The speakers for the invited and contributed talks will be added when these details are finalized. Recipients of the 'top paper' prizes will have the opportunity to give talks (which is what the 'contributed talk' slots are reserved for). The schedule is subject to change.
Times indicated are in Central Time (CT).
- 9:00am - 9:10am Opening remarks
- 9:10am - 9:40am Sharon Li: How to Handle Distributional Shifts? Challenges, Research Progress and Future Directions
- 9:40am - 10:25am Morning Poster Session
- 10:25 - 10:45am Coffee Break
- 10:45 - 11:15am Bo Li: Trustworthy Machine Learning via Learning with Reasoning
- 11:15am - 12:00pm Afternoon Poster Session
- 12:00pm - 12:45pm Lunch
- 12:45 - 1:15pm Dorsa Sadigh: Aligning Robot Representations with Humans
- 1:15pm - 1:45pm David Krueger: Sources of Specification Failure
- 1:45pm - 2:00pm Coffee Break
- 2:00pm - 2:30pm David Bau: Direct model editing: a framework for understanding model knowledge
- 2:30pm - 3:00pm Sam Bowman: What's the deal with AI safety?
- 3:00pm - 3:55pm Live Panel discussion with speakers
- 3:55pm - 4:00pm Closing remarks
- 9:00am - 9:15am Opening remarks
- 9:15am - 9:45am Invited talk #1
- 9:45am - 10:30am Live Poster Session #1 (Friendlier for Europe/Asia)
- 10:30-11:00am Coffee Break
- 11:00am-11:30am Invited talk #2
- 11:30pm - 12:15pm Live Poster Session #2 (Friendlier for North/South America)
- 12:15pm - 1:00pm Lunch
- 1:00pm - 1:30pm Invited talk #3
- 1:30pm - 1:45pm Contributed talk #1
- 1:45pm - 2:00pm Contributed talk #2
- 2:00pm - 2:15pm Contributed talk #3
- 2:15pm - 2:30pm Contributed talk #4
- 2:30pm - 3:00pm Invited talk #4
- 3:00pm - 3:30pm Coffee Break
- 3:00pm - 3:30pm Invited talk #5
- 3:30pm - 4:30pm Live Panel discussion with speakers
- 4:30pm - 5pm Invited talk #6