Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Medium-term Risks)
(Medium-term Risks)
 
(2 intermediate revisions by the same user not shown)
Line 37: Line 37:
 
* 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
 
* 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
 
* 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
 
* 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
 +
* 2026-02: [https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk Updated thoughts on AI risk: Things have gotten scarier since 2023] ([https://x.com/Noahpinion Noah Smith])
  
 
==Long-term  (x-risk)==
 
==Long-term  (x-risk)==
Line 114: Line 115:
 
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
 
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
 
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
 
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
 +
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Latest revision as of 11:13, 16 February 2026

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

Threat Vectors

See Also