About Session
AI systems are becoming increasingly capable and general-purpose, and are poised to reshape our life on Earth in the coming years and decades. Yet, by default, AI systems are not safe: they are hard to interpret, they break in unexpected ways, and we don’t know how to align them with our values. In this talk, I’ll present a brief overview of some ideas from AI safety, and then dive into a case study showing how even superhuman AI models can fail catastrophically in an adversarial setting.