AI SAFETY FUNDAMENTALS

ABOUT THE COURSE

XLab’s AI Safety Fundamentals course is a seven-week reading group aiming to build a thorough context in AI Safety technical research, governance, and policy. Each week for 90 minutes, students meet over dinner to read and discuss key papers. Students examine both technical safety challenges and broader policy considerations such as AI governance frameworks and regulatory approaches. 

All backgrounds are welcome. Applications will open until the end of week one each quarter.

Week 1: Scaling and Instrumental Convergence

Explore the implications of increasingly intelligent systems, focusing on scaling laws, superintelligence, and instrumental convergence.

Week 2: Outer Alignment

Examine the challenges in correctly specifying training goals for AI systems.

Week 3: Deception, Inner Alignment & Mechanistic Interpretability

Investigate the concept of mesa-optimizers and the potential for deceptive behavior in AI systems.

Week 4: AI Security

Explore various AI security issues including jailbreaks, adversarial examples, and potential vulnerabilities.

Week 5: AI Governance

Examine the challenges and approaches to governing AI development and deployment.

Week 6: Criticisms and Counter-Arguments

Examine critiques of AI safety concerns and alternative perspectives on AI development.

Week 7: Further Reading and Discussion

Explore various AI alignment approaches and dive deeper into specific areas of interest. Fellows will choose one of the optional readings to focus on for the week.

Scroll to Top