- Submission Deadline: Wednesday, Oct 5th AOE (updated)
- Workshop: Friday, December 9th.
- Adversaries: How can we create models that resist adversarial attacks, including attacks that are beyond the lp ball, perceptible, or unforeseen?
- Long Tails: How can we create models that do not misgeneralize or adapt in the face of long-tail events, feedback loops, and unknown unknowns?
- Anomaly Detection: How can we detect anomalous and malicious use, and how should these events be handled once they are detected?
- Calibration: How can we create models that are calibrated and accurately and honestly represent their beliefs?
- Hidden Model Functionality: How can we detect whether models have unexpected latent functionality, such as backdoors or emergent capabilities?
- Specification: How can we teach ML models complex human values?
- Optimization: How do we train agents to optimize for goals that integrate human values and how do we prevent agents from pursuing unintended instrumental goals?
- Brittleness: How can we prevent ML systems from gaming proxy objectives?
- Unintended Consequences: How can we design proxy objectives that do not create unintended feedback loops, do not create unwanted instrumental goals, and do not incentivize irreversible actions?
- ML for cyber security: how can ML be used to patch insecure code or detect cyberattacks?
- Informed decision making: how can ML be used to forecast events and raise crucial considerations?
We encourage submissions to demonstrate evidence of scalability. In particular, we encourage submissions that can improve the safety of large-scale ML models and can plausibly improve or apply to future larger versions of these models.
To read more about these categories, refer to Unsolved Problems in ML Safety.
Submission link: https://cmt3.research.microsoft.com/MLSW2022
The recommended paper length is 4 pages. Submissions may include supplementary material, but reviewers aren't required to read after 4 pages. The references can take as many pages as necessary and do not count towards the 4-page limit. Submissions should be in PDF format. Please use these Latex style files. The reviewing process is double-blind, so the submissions should be anonymized and not contain information that could identify the authors. If the authors' work has already been published in a journal, conference, or workshop, their submission should meaningfully extend their previous work. However, parallel submission (to a journal, conference, workshop, or preprint repository) is allowed. If your paper is accepted, you will be invited to present a poster at the workshop and may also be invited to give a talk. Accepted submissions will be shown on the workshop website.