AI safety - Revision history

KevinYager: /* Status */

2026-03-30T16:37:15Z

‎Status

2026-03-19T17:55:58Z

‎Description of Safety Concerns

2026-03-18T17:37:10Z

‎Research

2026-03-12T16:42:39Z

‎Light

2026-03-05T20:31:57Z

‎Research

2026-02-16T16:13:17Z

‎Medium-term Risks

2026-02-03T17:57:28Z

‎Research

2026-02-03T17:55:19Z

‎Research

2026-02-01T16:59:43Z

‎Medium-term Risks

2026-02-01T16:59:21Z

‎Medium-term Risks

@@ Line 55: / Line 55: @@
 * 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
 * [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)
 ==Assessmment==

@@ Line 27: / Line 27: @@
 * [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
 * [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
 ==Medium-term Risks==

@@ Line 118: / Line 118: @@
 * 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
 * 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
 ==Demonstrations of Negative Use Capabilities==

@@ Line 1: / Line 1: @@
 =Learning Resources=
 ==Light==
 * [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
 * Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
@@ Line 11: / Line 10: @@
 ** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
 * [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
 ==Deep==

@@ Line 116: / Line 116: @@
 ** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
 * 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
 ==Demonstrations of Negative Use Capabilities==

@@ Line 37: / Line 37: @@
 * 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
 * 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
 ==Long-term  (x-risk)==

@@ Line 28: / Line 28: @@
 ==Medium-term Risks==
-* 2023-04: [https://www.youtube.com/watch?v=xoVJKj8lcNQ A.I. Dilemma – Tristan Harris and Aza Raskin” (video)] ([https://assets-global .website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript]): raises concern about human ability to handle these transformations
+* 2023-04: [https://www.youtube.com/watch?v=xoVJKj8lcNQ A.I. Dilemma – Tristan Harris and Aza Raskin” (video)] ([https://assets-global.website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript]): raises concern about human ability to handle these transformations
 * 2023-04: [https://www.youtube.com/watch?v=KCSsKV5F4xc Daniel Schmachtenberger and Liv Boeree (video)]: AI could accelerate perverse social dynamics
 * 2023-10: [https://arxiv.org/pdf/2310.11986 Sociotechnical Safety Evaluation of Generative AI Systems] (Google DeepMind)