Difference between revisions of "AI understanding"

From GISAXS
Jump to: navigation, search
(Jagged Frontier)
(Information Processing/Storage)
 
(37 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
==Concepts==
 
==Concepts==
 
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
 
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
 +
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]
  
 
==Mechanistic Interpretability==
 
==Mechanistic Interpretability==
Line 17: Line 18:
 
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
 
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
 
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
 
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
 +
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
  
 
==Semanticity==
 
==Semanticity==
Line 42: Line 44:
 
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
 
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
 
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
 
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
 +
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
  
 
===Counter-Results===
 
===Counter-Results===
Line 50: Line 53:
 
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
 
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
 
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]
 
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]
 +
 +
==Meta-cognition==
 +
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
 +
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]
  
 
==Coding Models==
 
==Coding Models==
Line 80: Line 87:
 
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
 
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
 
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
 
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
 +
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
 +
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
 +
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
 +
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
  
 
==Topography==
 
==Topography==
Line 95: Line 106:
 
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
 
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
 
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
 
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
 +
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
 +
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]
  
 
===Semantic Directions===
 
===Semantic Directions===
Line 135: Line 148:
 
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
 
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
 +
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]
  
 
===Skeptical===
 
===Skeptical===
Line 145: Line 159:
 
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
 
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
 
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
 
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
 +
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
 
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
 
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
 
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
 
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
Line 165: Line 180:
 
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
 +
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
  
 
===Tests of Resilience to Dropouts/etc.===
 
===Tests of Resilience to Dropouts/etc.===
Line 195: Line 211:
  
 
===Scaling Laws===
 
===Scaling Laws===
 +
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
 
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
 
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
 
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
 
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
Line 207: Line 224:
 
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
 
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
 
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
 
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
 +
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]
  
 
=Information Processing/Storage=
 
=Information Processing/Storage=
Line 212: Line 230:
 
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
 
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
 
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
 
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
 +
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
 
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
 
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
 
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
 
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
Line 287: Line 306:
 
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
 
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
 
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]
 
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]
 +
 +
=Physics Based=
 +
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
 +
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
 +
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
 +
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]
  
 
=Failure Modes=
 
=Failure Modes=
Line 301: Line 326:
 
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
 
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
 
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
 
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
 +
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]
  
===LLM personalities===
+
===See also===
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
+
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
 +
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]
 +
 
 +
===Conversely (AI models converge)===
 +
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
  
 
==Model Collapse==
 
==Model Collapse==
Line 329: Line 359:
 
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
 
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
 
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
 
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
 +
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
  
 
==Allow LLM to think==
 
==Allow LLM to think==
Line 339: Line 370:
 
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
 
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
 
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
 
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
 +
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]
  
 
==Reasoning (CoT, etc.)==
 
==Reasoning (CoT, etc.)==
Line 346: Line 378:
 
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
 
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 +
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]
  
==Self-Awareness and Self-Recognition==
+
===Pathfinding===
 +
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
 +
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
 +
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
 +
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]
 +
 
 +
===Skeptical===
 +
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
 +
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]
 +
 
 +
==Self-Awareness and Self-Recognition and Introspection==
 +
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
 +
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
 
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
 
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
 +
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
 
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
 +
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
 
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
 
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
 +
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
 +
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
 +
 +
==LLM personalities==
 +
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
 +
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
  
 
==Quirks & Biases==
 
==Quirks & Biases==
 
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
 
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
 +
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]
  
 
=Vision Models=
 
=Vision Models=
Line 361: Line 415:
  
 
=See Also=
 
=See Also=
 +
* [[AI]]
 
* [[AI tools]]
 
* [[AI tools]]
 
* [[AI agents]]
 
* [[AI agents]]
 
* [[Robots]]
 
* [[Robots]]

Latest revision as of 14:18, 29 December 2025

Interpretability

Concepts

Mechanistic Interpretability

Semanticity

Counter-Results

Meta-cognition

Coding Models

Reward Functions

Symbolic and Notation

Mathematical

Geometric

Topography

Challenges

GYe31yXXQAABwaZ.jpeg

Heuristic Understanding

Emergent Internal Model Building

Semantic Directions

Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)

Task vectors:

Reasoning:

Feature Geometry Reproduces Problem-space

Capturing Physics

Theory of Mind

Skeptical

Information Processing

Generalization

Grokking

Tests of Resilience to Dropouts/etc.

  • 2024-02: Explorations of Self-Repair in Language Models
  • 2024-06: What Matters in Transformers? Not All Attention is Needed
    • Removing entire transformer blocks leads to significant performance degradation
    • Removing MLP layers results in significant performance degradation
    • Removing attention layers causes almost no performance degradation
    • E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
  • 2024-06: The Remarkable Robustness of LLMs: Stages of Inference?
    • They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
    • They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
      • Detokenization: Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
      • Feature engineering: Features are progressively refined. Factual knowledge is leveraged.
      • Prediction ensembling: Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
      • Residual sharpening: The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
    • This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

Semantic Vectors

Other

Scaling Laws

Information Processing/Storage

Statistics/Math

Tokenization

For numbers/math

Data Storage

Reverse-Engineering Training Data

Compression

Learning/Training

Cross-modal knowledge transfer

Hidden State

Convergent Representation

Function Approximation

Physics Based

Failure Modes

Fracture Representation

Jagged Frontier

See also

Conversely (AI models converge)

Model Collapse

Analysis

Mitigation

Psychology

Allow LLM to think

In-context Learning

Reasoning (CoT, etc.)

Pathfinding

Skeptical

Self-Awareness and Self-Recognition and Introspection

LLM personalities

Quirks & Biases

Vision Models

See Also