Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Research)
(Status)
 
(One intermediate revision by the same user not shown)
Line 27: Line 27:
 
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
 
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
 
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
 
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
 +
* 80,000 hours:
 +
** [https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/ Risks from power-seeking AI systems]
 +
** [https://80000hours.org/problem-profiles/gradual-disempowerment/ Gradual disempowerment]
 +
** [https://80000hours.org/problem-profiles/catastrophic-ai-misuse/ Catastrophic AI misuse]
  
 
==Medium-term Risks==
 
==Medium-term Risks==
Line 51: Line 55:
 
* 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
 
* 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
 
* [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)
 
* [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)
 +
* 2026-03: [https://windowsontheory.org/2026/03/30/the-state-of-ai-safety-in-four-fake-graphs/ The state of AI safety in four fake graphs]
  
 
==Assessmment==
 
==Assessmment==

Latest revision as of 12:37, 30 March 2026

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

Threat Vectors

See Also