GISAXS - User contributions [en]

AI understanding

2026-04-06T21:18:48Z

KevinYager: /* Psychology */

=Interpretability=
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]

==Concepts==
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]

==Mechanistic Interpretability==
* 2020-03: OpenAI: [https://distill.pub/2020/circuits/zoom-in/ Zoom In: An Introduction to Circuits]
* 2021-12: Anthropic: [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* 2022-09: [https://arxiv.org/abs/2211.00593 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-07: Anthropic: [https://transformer-circuits.pub/2024/july-update/index.html Circuits Update]
* 2025-01: [https://arxiv.org/abs/2501.14926 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition] ([https://www.alignmentforum.org/posts/EPefYWjuHNcNH4C7E/attribution-based-parameter-decomposition blog post])
* 2025-01: Review: [https://arxiv.org/abs/2501.16496 Open Problems in Mechanistic Interpretability]
* 2025-03: Anthropic: [https://www.anthropic.com/research/tracing-thoughts-language-model Tracing the thoughts of a large language model]
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
* 2026-01: [https://arxiv.org/abs/2601.13548 Patterning: The Dual of Interpretability]

==Semanticity==
* 2023-09: [https://arxiv.org/abs/2309.08600 Sparse Autoencoders Find Highly Interpretable Features in Language Models]
* Anthropic monosemanticity interpretation of LLM features:
** 2023-10: [https://transformer-circuits.pub/2023/monosemantic-features/index.html Towards Monosemanticity: Decomposing Language Models With Dictionary Learning]
** 2024-05: [https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet]
* 2024-06: OpenaAI: [https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders]
* 2024-08: [https://www.alignmentforum.org/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes Showing SAE Latents Are Not Atomic Using Meta-SAEs] ([https://metasae.streamlit.app/?page=Feature+Explorer&feature=11329 demo])
* 2024-10: [https://arxiv.org/abs/2410.08201 Efficient Dictionary Learning with Switch Sparse Autoencoders] ([https://github.com/amudide/switch_sae code]) More efficient SAE generation
* 2024-10: [https://arxiv.org/abs/2410.14670 Decomposing The Dark Matter of Sparse Autoencoders] ([https://github.com/JoshEngels/SAE-Dark-Matter code]) Shows that SAE errors are predictable
* 2024-10: [https://arxiv.org/abs/2410.13928 Automatically Interpreting Millions of Features in Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.21331 Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness]
* 2024-12: [https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for Transformers]
* 2024-12: [https://www.lesswrong.com/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders Matryoshka Sparse Autoencoders]
* 2024-12: [https://www.alignmentforum.org/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes Learning Multi-Level Features with Matryoshka SAEs]
* 2025-01: [https://arxiv.org/abs/2501.19406 Low-Rank Adapting Models for Sparse Autoencoders]
* 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]
* 2025-03: [https://arxiv.org/abs/2503.00177 Steering Large Language Model Activations in Sparse Spaces]
* 2025-03: [https://arxiv.org/abs/2503.01776 Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation]
* 2025-03: [https://arxiv.org/abs/2503.01824 From superposition to sparse codes: interpretable representations in neural networks]
* 2025-03: [https://arxiv.org/abs/2503.18878 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders]
* 2025-05: [https://arxiv.org/abs/2505.20063 SAEs Are Good for Steering -- If You Select the Right Features]
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]

===Counter-Results===
* 2020-10: [https://arxiv.org/abs/2010.12016 Towards falsifiable interpretability research]
* 2025-01: [https://arxiv.org/abs/2501.16615 Sparse Autoencoders Trained on the Same Data Learn Different Features]
* 2025-01: [https://arxiv.org/abs/2501.17148 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.17727 Sparse Autoencoders Can Interpret Randomly Initialized Transformers]
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]

==Meta-cognition==
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]

==Coding Models==
* '''Sparse Auto Encoders''': See Semanticity.
* [https://github.com/saprmarks/dictionary_learning dictionary_learning]
* [https://transformer-circuits.pub/2024/jan-update/index.html#predict-future Predicting Future Activations]
* 2024-06: [https://arxiv.org/abs/2406.11944 Transcoders Find Interpretable LLM Feature Circuits]
* 2024-10: [https://transformer-circuits.pub/2024/crosscoders/index.html Sparse Crosscoders for Cross-Layer Features and Model Diffing]

==Reward Functions==
* 2024-10: [https://arxiv.org/abs/2410.12491 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL]

==Symbolic and Notation==
* [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* [https://www.arxiv.org/abs/2407.09468 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures]
* 2024-07: [https://arxiv.org/abs/2407.02423 On the Anatomy of Attention]: Introduces category-theoretic diagrammatic formalism for DL architectures
* 2024-11: [https://x.com/vtabbott_/status/1860268276569506250 diagrams to represent algorithms]
* 2024-12: [https://arxiv.org/abs/2412.03317 FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness]

==Mathematical==
* 2024-06: [https://arxiv.org/abs/2406.13762 Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis]

==Geometric==
* 2023-11: [https://arxiv.org/abs/2311.03658 The Linear Representation Hypothesis and the Geometry of Large Language Models]
* 2024-06: [https://arxiv.org/abs/2406.01506 The Geometry of Categorical and Hierarchical Concepts in Large Language Models]
** Natural hierarchies of concepts---which occur throughout natural language and especially in scientific ontologies---are represented in the model's internal vectorial space as polytopes that can be decomposed into simplexes of mutually-exclusive categories.
* 2024-07: [https://arxiv.org/abs/2407.02678 Reasoning in Large Language Models: A Geometric Perspective]
* 2024-09: [https://arxiv.org/abs/2409.17592 Deep Manifold Part 1: Anatomy of Neural Network Manifold]
* 2024-10: [https://arxiv.org/abs/2410.19750 The Geometry of Concepts: Sparse Autoencoder Feature Structure]
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

==Topography==
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]
* 2026-02: [https://arxiv.org/abs/2601.06002 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning]

==Challenges==
* 2023-07Jul: [https://arxiv.org/abs/2307.13702 Measuring Faithfulness in Chain-of-Thought Reasoning] [https://x.com/davidad/status/1839641113432305790 roughly] proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

[[Image:GYe31yXXQAABwaZ.jpeg|300px]]

=Heuristic Understanding=
* 2022-09: Janus: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators]

==Emergent Internal Model Building==
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]

===Semantic Directions===
Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)
* [https://arxiv.org/abs/1301.3781 Efficient Estimation of Word Representations in Vector Space]
* [https://aclanthology.org/N13-1090/ Linguistic Regularities in Continuous Space Word Representations]
* [https://aclanthology.org/C16-1332 Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen]
* [https://aclanthology.org/D14-1162/ Glove: Global vectors for word representation]
* [https://doi.org/10.1109/BigData.2015.7364114 Using Word2Vec to process big text data]
* [https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets] (true/false)
* [https://arxiv.org/abs/2403.10381 Monotonic Representation of Numeric Properties in Language Models] (numeric directions)
Task vectors:
* [https://arxiv.org/abs/2310.15213 Function Vectors in Large Language Models]
* [https://arxiv.org/abs/2310.15916 In-context learning creates task vectors]
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
Reasoning:
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]

===Feature Geometry Reproduces Problem-space===
* [https://arxiv.org/abs/2210.13382 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task] (Othello)
* [https://arxiv.org/abs/2309.00941 Emergent linear representations in world models of self-supervised sequence models] (Othello)
* [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* [https://doi.org/10.1038/s41562-023-01659-w Emergent analogical reasoning in large language models]
* [https://arxiv.org/abs/2310.02207 Language Models Represent Space and Time] (Maps of world, US)
* [https://arxiv.org/abs/2405.14860 Not All Language Model Features Are Linear] (Days of week form ring, etc.)
* [https://arxiv.org/abs/2406.03689 Evaluating the World Model Implicit in a Generative Model] (Map of Manhattan)
* [https://iopscience.iop.org/article/10.1088/1748-9326/ad2891 Reliable precipitation nowcasting using probabilistic diffusion models]. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
* [https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis]: Different models (including across modalities) are converging to a consistent world model.
* [https://arxiv.org/abs/2501.00070 ICLR: In-Context Learning of Representations]
* [https://arxiv.org/abs/2502.00873 Language Models Use Trigonometry to Do Addition]: Numbers arranged in helix to enable addition
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]

===Capturing Physics===
* 2020-09: [https://arxiv.org/abs/2009.08292 Learning to Identify Physical Parameters from Video Using Differentiable Physics]
* 2022-07: [https://arxiv.org/abs/2207.00419 Self-Supervised Learning for Videos: A Survey]
* 2025-02: Fair at Meta: [https://arxiv.org/abs/2502.11831 Intuitive physics understanding emerges from self-supervised pretraining on natural videos]

===Theory of Mind===
* [https://arxiv.org/abs/2302.02083 Evaluating Large Language Models in Theory of Mind Tasks]
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

===Skeptical===
* 2025-01: [https://www.arxiv.org/abs/2501.09038 Do generative video models learn physical principles from watching videos?] ([https://physics-iq.github.io/ project], [https://github.com/google-deepmind/physics-IQ-benchmark code])
* 2025-06: [https://machinelearning.apple.com/research/illusion-of-thinking The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-06: [https://arxiv.org/abs/2506.21521 Potemkin Understanding in Large Language Models]
* 2025-06: [https://arxiv.org/abs/2506.21876 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation]

==Information Processing==
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
* 2024-07: [https://arxiv.org/abs/2407.20311 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process]
** Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
** When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]

===Generalization===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]

===Grokking===
* 2022-01: [https://arxiv.org/abs/2201.02177 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]
* 2022-05: [https://arxiv.org/abs/2205.10343 Towards Understanding Grokking: An Effective Theory of Representation Learning]
* 2024-01: [https://arxiv.org/abs/2401.10463 Critical Data Size of Language Models from a Grokking Perspective]
* 2024-02: [https://arxiv.org/abs/2402.15175 Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition]
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

===Tests of Resilience to Dropouts/etc.===
* 2024-02: [https://arxiv.org/abs/2402.15390 Explorations of Self-Repair in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.15786 What Matters in Transformers? Not All Attention is Needed]
** Removing entire transformer blocks leads to significant performance degradation
** Removing MLP layers results in significant performance degradation
** Removing attention layers causes almost no performance degradation
** E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
* 2024-06: [https://arxiv.org/abs/2406.19384 The Remarkable Robustness of LLMs: Stages of Inference?]
** They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
** They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
*** '''Detokenization:''' Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
*** '''Feature engineering:''' Features are progressively refined. Factual knowledge is leveraged.
*** '''Prediction ensembling:''' Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
*** '''Residual sharpening:''' The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
** This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

==Semantic Vectors==
* 2024-06: [https://arxiv.org/abs/2406.11717 Refusal in Language Models Is Mediated by a Single Direction]
* 2025-02: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs] ([https://x.com/OwainEvans_UK/status/1894436637054214509 demonstrates] [https://x.com/ESYudkowsky/status/1894453376215388644 entangling] of concepts into a single preference vector)
* 2025-03: [https://arxiv.org/abs/2503.03666 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction]

==Other==
* 2024-11: [https://arxiv.org/abs/2411.00247 Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond]
* 2024-11: [https://arxiv.org/abs/2411.04282 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding] ([https://github.com/SalesforceAIResearch/LaTRO code])
* 2024-11: [https://arxiv.org/abs/2411.12580 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models]: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
* 2024-11: [https://arxiv.org/abs/2411.15862 LLMs Do Not Think Step-by-step In Implicit Reasoning]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]

===Scaling Laws===
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
* 2020-01: [https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models] (OpenAI)
* 2020-10: [https://arxiv.org/abs/2010.14701 Scaling Laws for Autoregressive Generative Modeling] (OpenAI)
* 2020-05: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] (Gwern)
* 2021-08: [https://arxiv.org/abs/2108.07686 Scaling Laws for Deep Learning]
* 2021-02: [https://arxiv.org/abs/2102.06701 Explaining Neural Scaling Laws] (Google DeepMind)
* 2022-03: [https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models] (Chinchilla, Google DeepMind)
* 2025-03: [https://arxiv.org/abs/2503.04715 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining]
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]

=Information Processing/Storage=
* 2020-02: [https://arxiv.org/abs/2002.10689 A Theory of Usable Information Under Computational Constraints]
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
** 2024-04: [https://arxiv.org/abs/2404.08819 The Illusion of State in State-Space Models] (figure 3)
** 2024-08: [https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size] (table 9)
* 2024-09: [https://arxiv.org/abs/2409.10482 Schrodinger's Memory: Large Language Models]
* 2024-10: [https://arxiv.org/abs/2407.01687 Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning]. CoT involves both memorization and (probabilitic) reasoning
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
* 2025-12: [https://arxiv.org/abs/2512.22471 The Bayesian Geometry of Transformer Attention]
* 2026-01: [https://arxiv.org/abs/2601.03220 From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence]

==Statistics/Math==
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]

==Tokenization==
===For numbers/math===
* 2024-02: [https://arxiv.org/abs/2402.14903 Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs]: L2R vs. R2L yields different performance on math

==Data Storage==
* 1988-09: [https://www.sciencedirect.com/science/article/pii/0885064X88900209 On the capabilities of multilayer perceptrons]
* 2006-12: [https://ieeexplore.ieee.org/document/4038449 Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition] (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N2 bits w/ N2 params)
* 2016-11: [https://arxiv.org/abs/1611.09913 Capacity and Trainability in Recurrent Neural Networks] (5 bits/param)
* 2018-02: [https://arxiv.org/abs/1802.08232 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks]
* 2019-05: [https://ieeexplore.ieee.org/document/8682462 Memorization Capacity of Deep Neural Networks under Parameter Quantization]
* 2020-02: [https://arxiv.org/abs/2002.08910 How Much Knowledge Can You Pack Into the Parameters of a Language Model?]
* 2020-08: [https://arxiv.org/abs/2008.09036 Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries] (capacity scales linearly with parameters; more training samples leads to less memorization)
* 2020-12: [https://arxiv.org/abs/2012.06421 When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?]
* 2024-04: [https://arxiv.org/abs/2404.05405 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws] (2 bits/param)
* 2024-06: [https://arxiv.org/abs/2406.15720 Scaling Laws for Fact Memorization of Large Language Models] (1T params needed to memorize Wikipedia)
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-05: [https://arxiv.org/abs/2505.24832 How much do language models memorize?] (3.6 bits/parameter)
* 2025-06: [https://arxiv.org/abs/2506.01855 Trade-offs in Data Memorization via Strong Data Processing Inequalities]

===Reverse-Engineering Training Data===
* 2025-06: [https://arxiv.org/abs/2506.10364 Can We Infer Confidential Properties of Training Data from LLMs?]
* 2025-06: [https://arxiv.org/abs/2506.15553 Approximating Language Model Training Data from Weights]

===Compression===
* 2022-12: [https://arxiv.org/abs/2212.09410 Less is More: Parameter-Free Text Classification with Gzip]
* 2023-06: [https://arxiv.org/abs/2306.04050 LLMZip: Lossless Text Compression using Large Language Models]
* 2023-07: [https://aclanthology.org/2023.findings-acl.426/ “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors]
* 2023-09: [https://arxiv.org/abs/2309.10668 Language Modeling Is Compression]
* 2024-06: [https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation]

==Learning/Training==
* 2018-03: [https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks]: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
* 2024-12: [https://arxiv.org/abs/2412.11521 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory]
* 2025-01: [https://arxiv.org/abs/2501.12391 Physics of Skill Learning]
* 2025-05: [https://arxiv.org/abs/2505.24864 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models]

===Cross-modal knowledge transfer===
* 2022-03: [https://arxiv.org/abs/2203.07519 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer]
* 2023-05: [https://arxiv.org/abs/2305.07358 Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

==Hidden State==
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
===Convergent Representation===
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
* 2025-12: [https://arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]

==Function Approximation==
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]: can learn linear functions (equivalent to least-squares estimator)
* 2022-11: [https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning]: Simple arithmetic
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models] ([https://github.com/ekinakyurek/google-research/tree/master/incontext code]): can learn linear regression
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2023-06: [https://arxiv.org/abs/2306.00297 Transformers learn to implement preconditioned gradient descent for in-context learning]
* 2023-07: [https://arxiv.org/abs/2307.03576 One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention]
* 2024-04: [https://arxiv.org/abs/2404.02893 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline]
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]

=Physics Based=
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]

=Failure Modes=
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?]: Poor causal inference
* 2023-09: [https://arxiv.org/abs/2309.12288 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"]
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)

==Adversarial==
* 2026-03: [https://arxiv.org/abs/2603.03507 Solving adversarial examples requires solving exponential misalignment]

==Fracture Representation==
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])

==Jagged Frontier==
* 2023-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality]
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]

===See also===
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]

===Conversely (AI models converge)===
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
* 2025-12: [https://arxiv.org/abs/2512.05117 The Universal Weight Subspace Hypothesis]
* 2026-01: [https://avikrishna.substack.com/p/eliciting-frontier-model-character Eliciting Frontier Model Character Training: A study of personality convergence across language models]

==Model Collapse==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
* 2023-11: [https://arxiv.org/abs/2311.12202 Nepotistically Trained Generative-AI Models Collapse]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2026-01: [https://arxiv.org/abs/2601.05280 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis]

===Analysis===
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]

===Mitigation===
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]
* 2025-03: [https://arxiv.org/abs/2503.08117 Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models]

=Psychology=
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
* 2025-10: [https://arxiv.org/abs/2510.11328 Do LLMs "Feel"? Emotion Circuits Discovery and Control]
* 2026-01: [https://arxiv.org/abs/2601.06047 "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs]
* 2026-02: [https://arxiv.org/abs/2602.02606 Gender Dynamics and Homophily in a Social Network of LLM Agents]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]
* 2026-04: [https://transformer-circuits.pub/2026/emotions/index.html Emotion concepts and their function in a large language model] ([https://www.anthropic.com/research/emotion-concepts-function blog])

==Persona Simulator Theory==
* 2022-09: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators] ([https://www.lesswrong.com/users/janus-1?from=post_header janus])
* 2022-12: [https://aclanthology.org/2022.findings-emnlp.423/ Language Models as Agent Models]
* 2023-02: [https://arxiv.org/abs/2302.00805 Conditioning Predictive Models: Risks and Strategies]
* 2024-09: [https://www.lesswrong.com/s/qhdHbCJ3PYesL9dde Intuitive Self-Models]
* 2026-02: [https://alignment.anthropic.com/2026/psm/ The Persona Selection Model: Why AI Assistants might Behave like Humans] (Anthropic, [https://www.anthropic.com/research/persona-selection-model blog])

==Allow LLM to think==
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]

===In-context Learning===
* 2021-10: [https://arxiv.org/abs/2110.15943 MetaICL: Learning to Learn In Context]
* 2022-02: [https://arxiv.org/abs/2202.12837 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]

==Reasoning (CoT, etc.)==
* 2025-01: [https://arxiv.org/abs/2501.18009 Large Language Models Think Too Fast To Explore Effectively]
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]

===Pathfinding===
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]

===Skeptical===
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]

==Self-Awareness and Self-Recognition and Introspection==
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
* 2025-12: [https://www.arxiv.org/abs/2512.24661 Do Large Language Models Know What They Are Capable Of?]

==LLM personalities==
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
* 2026-01: [https://arxiv.org/abs/2601.10387 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models]

==Quirks & Biases==
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]

=Vision Models=
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])

=See Also=
* [[AI]]
* [[AI tools]]
* [[AI agents]]
* [[Robots]]

AI video

2026-04-05T01:19:56Z

KevinYager: /* April 2026 */

==Evolution of Capabilities==
===Early===
* November 2016: [https://arxiv.org/abs/1611.10314 Sync-Draw]
* April 2021: [https://arxiv.org/abs/2104.14806 GODIVA]
* October 2022: [https://makeavideo.studio/ Meta Make-a-video]
* October 2022: [https://imagen.research.google/video/ Google Imagen video]

===2023===
* April 2023: [https://www.youtube.com/watch?v=XQr4Xklqzw8 Will Smith eating spaghetti]
* April 2023: [https://x.com/nickfloats/status/1642899094808002564 Harry Potter by Balenciaga]
* April 2023: [https://x.com/mrjonfinger/status/1645953033636048896?cxt=HHwWgMDT7YfkzNctAAAA Runway Gen 2]
* April 2023: [https://research.nvidia.com/labs/toronto-ai/VideoLDM/ Nvidia latents]
* December 2023: [https://www.threads.net/@luokai/post/C0vvEnTP4Oj Fei-Fei Li]

===2024===
====Early 2024====
* January 2024: [https://sites.research.google/videopoet/ Google VideoPoet]
* January 2024: [https://lumiere-video.github.io/ Google Lumiere]
* February 2024: [https://openai.com/index/sora/ OpenAI Sora]
* April 2024: [https://www.maginative.com/article/china-unveils-vidu-a-powerful-text-to-video-generator/ Vidu]
* May 2024: [https://deepmind.google/technologies/veo/ Veo]
* May 2024: [https://kling.kuaishou.com/ Kling]
* June 2024: [https://lumalabs.ai/dream-machine Luma DreamMachine]
* June 2024: [https://runwayml.com/research/introducing-gen-3-alpha RunwayML Gen-3 Alpha]
* July 2024: Examples:
** [https://www.youtube.com/watch?v=F_WfIzYGlg4 Toys-R-Us Commercial made using Sora]
** [https://www.youtube.com/watch?v=CSfw_NjqQ2o Motorola commercial made using genAI]
* July 2024: [https://x.com/rowancheung/status/1813258518159585723 haiper.ai]
====August 2024====
* August 2024: [http://hotshot.co/ Hotshot] ([https://x.com/maxescu/status/1825459083635536081 examples], [https://x.com/EccentrismArt/status/1825550841534972027 more examples])
* August 2024: Luma Dream Machine [https://x.com/LumaLabsAI/status/1825639918539817101 v1.5]
* August 2024: Examples:
** [https://x.com/endlesstaverns/status/1811276904692887815 Runway Gen3 music video]
** [https://x.com/runwayml/status/1820806644806070583 Runway Gen3 for adding FX to live action] ([https://x.com/bryanf0x/status/1825529998201004137 another example])
** [https://www.youtube.com/watch?v=taaM0s1bq7Q Midjourney + Runway Gen3: Hey It’s Snowing]
** [https://x.com/Kyrannio/status/1821605619927019974 Flux/LoRA image] + Runway Gen3 [https://x.com/iamneubert/status/1821970292014768420 woman presenter]
** [https://x.com/CharaspowerAI/status/1825274421256356106 McDonald’s AI commercial]
** Sora used by [https://www.facebook.com/izanamiaiart/ Izanami AI Art] to create [https://x.com/kimmonismus/status/1824102316229759114 dreamlike video] and by [https://x.com/alexiaadana Alexia Adana] to create [https://x.com/basedjensen/status/1824386717123743940 sci-fi film concept]
====September 2024====
* September 2024: [https://hailuoai.com/video/ Hailuo Minimax] ([https://x.com/minchoi/status/1829995683124035766 examples])
* September 2024: Examples:
** [https://www.youtube.com/watch?v=XAs5KuhfE_s Space colonization]
** [https://x.com/venturetwins/status/1827772646295265699 Consistent characters]
** [https://x.com/thealexbanks/status/1829489392354050502 Sea monsters]
** [https://x.com/ai_for_success/status/1829539535132426286 Music video]
** [https://x.com/RyanMorrisonJer/status/1829074823521112544 Animated characters]
** [https://x.com/CharaspowerAI/status/1829916782452191674 AI influencer]
** [https://x.com/minchoi/status/1829293248197902802 Ten short examples]
** [https://x.com/WorldEverett/status/1830596701473615937 Seven examples]
** [https://x.com/EccentrismArt/status/1830654805515395583 Clip from horror film]
** [https://x.com/MatthieuGB/status/1722146578813645296 "Gone" featuring astronaut] and [https://x.com/MatthieuGB/status/1742949297337852270 something ethereal]
** [https://x.com/kimmonismus/status/1831256663644373449 Two dancers] (surprisingly good consistency despite movement)
** [https://x.com/8bit_e/status/1831344542487871953 Music video about flying]
** [https://www.youtube.com/watch?v=_XtS_4PzEyk The Paperclip Maximizer]
** [https://x.com/trbdrk/status/1831801373517869369 La Baie Aréa]
** [https://www.reddit.com/r/aivideo/comments/1f8xr0w/gisele_tong_to_dear_me/ "To Dear Me" by Gisele Tong] ([https://www.morningstar.com/news/business-wire/20240904521664/reply-ai-film-festival-announced-the-winners-of-the-first-international-festival-for-short-films-made-with-artificial-intelligence winner of AI shorts] film festival)
** [https://x.com/maxescu/status/1833476640438964281 Various scenes]
** [https://x.com/EHuanglu/status/1833522650846793970 Directing emotions]
* September 2024: Kling 1.5 ([https://x.com/Uncanny_Harry/status/1836531835280724459 examples], [https://x.com/minchoi/status/1836800551469654088 showing emotions])
* September 2024: Examples:
** Runway video-to-video to [https://x.com/jon_barron/status/1835695132697604236 restyle classic video games]
** [https://x.com/ai_for_success/status/1835319670917796117 Realistic presenter]
** [https://x.com/kimmonismus/status/1834530744175059302 Skateboarding] (demonstrates getting closer to meaningfully simulating motion/physics)
** [https://x.com/minchoi/status/1835378029092049325 Examples] of short clips with cinematic feel
** Short: [https://x.com/PJaccetturo/status/1835670655330869633 4 Minutes to Live]
** Short: [https://x.com/dreamingtulpa/status/1836121321526432231 Neon Nights] (Arcade)
** [https://www.youtube.com/watch?v=CcrGSA-kSrI Random Access Memories]: AI-generated, but then projected onto Kodak film stock. Gives the final output some of the dreamy analog quality we associate with nostalgic footage
** Sora used to make a sort of [https://x.com/niceaunties/status/1837271244774715505 weird dreamlike video]
====October 2024====
* October 2024: Pika v1.5, including Pikaffects (explode, melt, inflate, and cake-ify; examples: [https://x.com/justin_hart/status/1841144350572413259 1], [https://x.com/arthur_hyper88/status/1841156544538521646 2], [https://x.com/ytjessie_/status/1841168925301842263 3], [https://x.com/bilawalsidhu/status/1841195247184781420 4], [https://x.com/minchoi/status/1841189035454447636 5], [https://x.com/ytjessie_/status/1841209415514669501 6])
* October 2024: Examples:
** [https://x.com/HalimAlrasihi/status/1839310216602788103 AI avatar with good lip-sync]
** [https://www.youtube.com/watch?v=5NZubOOeeV0 Battalion]: 5 minute short about war
** Short film: [https://x.com/MatthieuGB/status/1841173724688536015 To Wonderland] ([https://x.com/MatthieuGB/status/1841174221550207437 credits])
** [https://x.com/OnwardsProject/status/1841508441241890975 9 to 5]: Created with Luma Dream Machine keyframes and camera features; music by Suno
* October 2024: [https://ai.meta.com/research/movie-gen/ Meta Movie Gen]
* October 2024: Examples:
** [https://x.com/CuriousRefuge/status/1844424871335592373 AI Avatar] (using [https://x.com/CuriousRefuge/status/1844424871335592373 HeyGen])
** [https://www.youtube.com/watch?v=isW1FLL0K3w Generic Movies]
** [https://arxiv.org/abs/2410.05954 Pyramid-flow] ([https://huggingface.co/rain1011/pyramid-flow-sd3 open source]) model: [https://x.com/_akhaliq/status/1844239643778351605 examples]
** [https://x.com/whrumorvid/status/1846209247467491604 Building the Pyramids]
** [https://x.com/maxescu/status/1844716998854349217 People showing realistic emotion] (using [https://hailuoai.video/ Hailuo AI])
** Keyframes and Luma AI to make novel [https://x.com/CoffeeVectors/status/1845188179332051005 speed-ramp motion]
* October 2024: [https://pollo.ai/ Pollo AI] platform offers selection among a diversity of video models
* October 2024: [https://www.genmo.ai/ Genmo] [https://x.com/genmoai/status/1848762405779574990 Mochi 1] (open source)
* October 2024: Examples:
** [https://x.com/AIatMeta/status/1849134463382680028 Meta Movie Gen examples]
** [https://x.com/PJaccetturo/status/1847732127598800960 Emotional range of Minimax]
** [https://x.com/dustinhollywood/status/1848757800807039299 Car commercial: Bear]
** [https://x.com/runwayml/status/1848785913918218517 Diner conversation]
** [https://x.com/Uncanny_Harry/status/1849275871716159989 Loved and Lost] (a meditation on grief)
====November 2024====
* November 2024: Examples:
** [https://x.com/blizaine/status/1852092147643699356 Pasta Doble]
** [https://x.com/kimmonismus/status/1852425015175626876 Bird protecting young]
** [https://x.com/runwayml/status/1852363190484537666 Camera moving around sushi]
** [https://x.com/StevieMac03/status/1851969120813629939 Various examples] of [https://hailuoai.video/ Hailuo AI]
** [https://x.com/kimmonismus/status/1853102779650252978 Trains]
** [https://www.youtube.com/watch?v=Fh-_g5vev0s Light of Imagination]
** [https://x.com/LinusEkenstam/status/1854087441122021814 Bringing historic images to life]
** [https://x.com/DeryaTR_/status/1855637066203218180 Plants dancing]
** [https://x.com/c_valenzuelab/status/1855078644042944574 Insect on tree]
** Trailers for [https://x.com/abandonedmovies/status/1827037378009296983 The Silmarillion] and [https://x.com/abandonedmovies/status/1846941183702110211 The Fall of Gondolin] (by [https://x.com/abandonedmovies Abandoned Films])
** [https://x.com/Diesol/status/1855475704470884427 Moody sci-fi]
** [https://x.com/runwayml/status/1857072173631885586 Migration] ([https://runwayml.com/customers/behind-the-scenes-of-migration-with-director-jeremy-higgins made by combining] Runway ML Gen3-Alpha and traditional animation)
** [https://x.com/AIandDesign/status/1856467856625676752 After the Winter] ([https://suno.com/song/0d6919de-d2bf-434b-8aa6-ede0fb0fde77 music] made using Suno v4)
** Horror: [https://www.reddit.com/r/aivideo/comments/1gnk27q/ridge_to_southwest/ Ridge to Southwest]
** [https://www.youtube.com/watch?v=ClStJZmIjBU The Gardener] (by [https://www.youtube.com/@MachineMythos Machine Mythos])
** [https://x.com/techhalla/status/1857462526859935813 Coca-Cola holiday ad] and [https://www.youtube.com/watch?v=THdoOgwqjBg parody thereof]
** [https://x.com/pzf_ai/status/1858312421510992111 A Dream Within A Dream] (by [https://x.com/pzf_ai PZF], selected for the Czech International AI Film Festival)
** [https://x.com/WorldEverett/status/1859273222597775843 Making Friends] (by [https://x.com/WorldEverett Everett World]; see also [https://x.com/WorldEverett/status/1858563716834275562 Childhood Dream] and [https://x.com/WorldEverett/status/1858945634067202429 City Echoes])
** Anime: [https://x.com/naegiko/status/1857754626742726893 test shots], [https://x.com/naegiko/status/1858978557424210401 Ultimate Ceremony], [https://x.com/naegiko/status/1835434668294074462 Echoes of Love]
** [https://x.com/KakuDrop/status/1866309309384323257 Echoes of Grace] ([https://x.com/KakuDrop KakuDrop] using Sora)
** [https://x.com/vibeke_udart/status/1859879367071203662 Morphing hands], [https://x.com/vibeke_udart/status/1858772719224975630 hands and faces] ([https://x.com/vibeke_udart Vibeke Bertelsen])
** [https://www.reddit.com/r/aivideo/comments/1gxi29x/dbzlicious/ Dragon Ball Z live action]
** [https://x.com/cfryant/status/1860727980353278386 Pitch Black] (abstract and dark)
** [https://x.com/cfryant/status/1861050528932765710 Animals Running] (zoomed-in ultra-wide camera)
** [https://x.com/WorldEverett/status/1860730214487118290 Dreams of Tomorrow] (panning shots of high-tech car, Scottish manor)
** [https://x.com/nickfloats/status/1861206978690691165 Desert Planet Cinematics]
* November 2024: [https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora Leaked] Sora turbo model; [https://x.com/rowancheung/status/1861455031603503234 examples], [https://x.com/chatgpt21/status/1861504511153451517 Dog chasing Cat in snow]
====December 2024====
* December 2024: Examples:
** [https://x.com/minchoi/status/1863243880553976235 Realistic] (Minimax by Hailuo AI)
** Trailer for [https://x.com/TheReelRobot/status/1861824847149670840 Paradise Lost] (to be released on [https://www.sandwatch.ai/ Sandwatch AI])
** [https://x.com/EHuanglu/status/1863607136271716418 Music video example] with consistent characters
** [https://x.com/venturetwins/status/1863666366764687581 Human expressions] ([https://www.reddit.com/r/ChatGPT/comments/1h4r13x/ai_generated_expressions/ u/Kind_Distance9504 on Reddit], using Hailuo)
** Vodafone ad: [https://www.youtube.com/watch?v=9AyEC_K9kBg The Rhythm Of Life]
** [https://www.reddit.com/r/midjourney/comments/1h5u2gw/we_made_a_10_minute_gen_ai_batman_film/ 10 minute Batman film]
* December 2024: Tencent [https://aivideo.hunyuan.tencent.com/ Hunyuan Video] open-source video model ([https://x.com/CharaspowerAI/status/1863862585554010530 example])
* December 2024: [https://sora.com/ Sora] release ([https://x.com/CharaspowerAI/status/1866203050982916532 examples])
* December 2024: [https://mint-video.github.io/ MinT video] improves consistency and control ([https://arxiv.org/abs/2412.05263 preprint], [https://x.com/EHuanglu/status/1868278456565531061 examples])
* December 2024: Google [https://blog.google/technology/google-labs/video-image-generation-update-december-2024/ Veo 2] ([https://x.com/sundarpichai/status/1868709099644334518 examples], [https://x.com/EHuanglu/status/1869008306322522342 more examples], [https://x.com/_Borriss_/status/1869267571532320966 natural movement examples], [https://x.com/jerrod_lew/status/1870816560027246715 abstract], [https://x.com/jerrod_lew/status/1869427407415058660 realistic physics], [https://x.com/jerrod_lew/status/1873096585002786944 crowds], [https://x.com/minchoi/status/1873590350515929380 dancing], [https://x.com/jerrod_lew/status/1874440442269565351 animals])
* December 2024: [https://x.com/pika_labs/status/1867651381840040304 Pika 2.0] with Scene Ingredients
* December 2024: Examples:
** [https://www.youtube.com/watch?v=c_kKKRQ5gYw Synthetic Youth: Takenoko Zoku · Made by Emi Kusano with Sora]
** [https://x.com/higgsfield_ai/status/1868698886761837041 Car race] ([https://higgsfield.ai/ Higgsfield AI] storytelling)
** [https://x.com/blizaine/status/1868850653759783033 Slicing meat]; comparison of modern video generators
** Challenging prompt: [https://x.com/RubenEVillegas/status/1868864410720325844 A cat roars while looking at its reflection in the mirror but instead sees itself as a lion roaring (Veo 2)] ([https://x.com/anukaakash/status/1869417975071330550 comparison to other models])
** [https://x.com/PJaccetturo/status/1869829338868412865 Anime trailer]
** [https://x.com/ring_hyacinth/status/1870386506776674376 Snorlax at Mount Fuji] and [https://x.com/ring_hyacinth/status/1871105733443592696 Psyduck at Colosseum] (Kling 1.6)
** [https://x.com/machine_mythos/status/1870565287789056320 Horror visuals] (with [https://mmaudio.net/ MMAudio] sound)
** [https://www.youtube.com/watch?v=lFc1jxLHhyM The Heist] (Veo 2)
** [https://x.com/minchoi/status/1871263616806129863 Various Veo 2 examples]
** [https://x.com/minchoi/status/1872390429108486320 Live Action Titans]
** [https://x.com/kimmonismus/status/1873094065841193222 Cats] [https://x.com/PostsOfCats/status/1872530207585825058 Cooking]
** Aesthetic from alternate timelines: [https://x.com/BrianRoemmele/status/1871753358782120068 1], [https://x.com/BrianRoemmele/status/1872105833456423216 2], [https://x.com/brain_racked/status/1872340717978390583 3]
** [https://x.com/minchoi/status/1872486717145706793 Examples approaching cinematic quality]
** [https://x.com/JaicSam/status/1872903054221033693 Cosmic Spider] (winner at AI film festival)
** [https://www.youtube.com/watch?v=dbdYPMRi_Nk Trailer for Newton's Cradle] (full film [https://x.com/JeffSynthesized/status/1872705173451358293 on] 2025-01-01)
** [https://x.com/Ror_Fly/status/1873036384077828499 Car vs. Jet drag race]
** [https://x.com/Diesol/status/1873415500149199066 California Monsters]
** [https://x.com/heyshrutimishra/status/1873631383584924078 Various examples] (Hailuo AI)
** [https://x.com/kimmonismus/status/1873568693357294014 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023])
** [https://x.com/StevieMac03/status/1873998177193648438 Sorceress and Arachnid Steed] (Kling v1.6)
** [https://x.com/dustinhollywood/status/1873940924016779425 Music video] (Hailuo AI)
** [https://www.youtube.com/watch?v=iQg2udCHMdI Akụkọ (Story)] (22 minute short) - A Lagos Boy's Thrilling Snack Run Nightmare
** [https://x.com/cinerobot/status/1873766976306455019 Son of the Dragon] (8 minute short)
** [https://x.com/SynthReveries/status/1873624586857886071 Endless Journey] music video ([https://suno.com/song/fa90fa5e-25c7-48ad-b291-42a8a8c51cf9 music] by Suno)
** [https://x.com/anukaakash/status/1870504167653228980 Once Again] (retrospective)
** [https://x.com/jasonzada/status/1873470586053414928 Fade Out] (Veo 2)
** [https://x.com/talkboysstudio/status/1869085014513865027 Roadkill] (12 minute short)

===2025===
====January 2025====
* January 2025: [https://x.com/kimmonismus/status/1877351050748871038 Progress] over the last 1.5 years, by comparing Runway Gen 2 and Veo 2.
* January 2025: Examples:
** [https://x.com/AllarHaltsonen/status/1874557865576542655 Delivery] (unofficial Nike ad)
** [https://x.com/Diesol/status/1875237221735002299 Gucci ad] (Sora)
** [https://x.com/EccentrismArt/status/1874498145910149412 Conquest]
** [https://www.youtube.com/watch?v=RJZCMfaS-io Newton's Cradle] (6 minute short)
** [https://x.com/AngryTomtweets/status/1874627041934602410 Singer]
** [https://x.com/DumpsterBud/status/1874807352794182019 Brain vomit] (music video)
** [https://x.com/mxvdxn/status/1874796628210778618 Vibe] (Kling v1.6)
** [https://x.com/_deepfates/status/1875215969452523785 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024])
** [https://www.youtube.com/watch?v=BL9-jHGnxyc Zorgop Knows All] (2 minute short)
** [https://x.com/ButchersBrain/status/1875130428518269406 The Breach] (2 minute short; Veo2, Runway ActOne, MMaudio)
** [https://x.com/Rainmaker1973c/status/1875261591043850477 Aesthetics from an alternate timeline]
** [https://x.com/StevieMac03/status/1875440611849072841 Immortal Awakens]
** [https://x.com/isaachorror/status/1875624519588835400 The Faded Line]
** [https://www.youtube.com/watch?v=4fy8H38rm-4 Dear Dad]
** [https://x.com/maxescu/status/1877060580680311242 Mad Max chase]
** [https://x.com/kimmonismus/status/1877408247906447633 Patience is Key]
** [https://x.com/techhalla/status/1879967230093586555 The Almost Famous Show] (talent show parody)
** [https://x.com/thefuzzysignal/status/1879295176990154755 Proof-of-concept trailer for a medieval adult animated series]
** [https://x.com/JeffSynthesized/status/1879555151499034869 Variety] (unofficial Cadbury ad)
** [https://x.com/henrydaubrez/status/1879883806947115446 Kitsune] (5 minute animated short, Veo 2)
* January 2025: MiniMax Hailuo [https://www.minimaxi.com/en/news/s2v-01-release Subject Reference] enables consistent characters ([https://x.com/minchoi/status/1881707687362412924 examples])
* January 2025: AI (de-aging deepfakes, [https://magnific.ai/ Magnific]) [https://x.com/JeffSynthesized/status/1878630652377178502 used in the film] [https://www.imdb.com/title/tt18272208/ "Here"].
* January 2025: Luma [https://lumalabs.ai/ray Ray2]
* January 2025: [https://pikartai.com/pika-2-1/ Pika 2.1] ([https://x.com/OrctonAI/status/1883925754653905049 examples])
* January 2025: Examples:
** [https://x.com/wyzborrero/status/1879949477764804873 Light projections onto people] (challenging task, Ray2)
** [https://x.com/AllarHaltsonen/status/1881261042753589547 BMW ad]
** [https://x.com/AIWarper/status/1880658326645878821 John Wick in Severance] (Hunyuan vid2vid)
** [https://x.com/TheReelRobot/status/1881771800595444193 Biopic] (7 minutes)
** [https://x.com/misslaidlaw/status/1882180619582791784 Give It To Me] (music video)
** [https://x.com/paultrillo/status/1882091702506459394 Where do we go from here?] (music video, Veo 2)
** [https://x.com/WorldEverett/status/1882235057076580502 Party like there's no tomorrow] (music video)
** [https://x.com/Diesol/status/1884696027942498779 S.T.O.R.I.] (Midjourney and Pika 2.1)
====February 2025====
* February 2025: Examples:
** [https://x.com/OrctonAI/status/1885839287913955597 Long Steampunk scene]
** [https://x.com/jerrod_lew/status/1885787580685562226 City destruction]
** [https://x.com/EHuanglu/status/1885736840344551763 Consistent character acting]
** [https://x.com/MeanOrangeCat/status/1884295241534185890 Kaiju Katastrophe] (by [https://x.com/MeanOrangeCat Mean Orange Cat])
** [https://x.com/Diesol/status/1886433799690748210 The Greyhound]
** [https://x.com/CoffeeVectors/status/1886146242029195391 Fluid simulation video2video]
** [https://x.com/toolstelegraph/status/1886622772828254403 High resolution macro shots]
** [https://www.youtube.com/watch?v=p0J1LDWERS0 Chrysalids]
** [https://x.com/multimodalart/status/1887817996220940737 Boring realistic images] (HunyuanVideo w/ LoRA)
** [https://www.youtube.com/watch?v=PcVRfa1JyyQ Anime intro] ([https://www.reddit.com/r/StableDiffusion/comments/1ijvua0/opensource_almostconsistent_real_anime_made_with/ Hunyuan w/ custom LoRAs])
** [https://x.com/AllarHaltsonen/status/1888294811750318114 Automotive ad test] (Kling w/ custom model)
** [https://x.com/AngryTomtweets/status/1888758524303269928 Random cinematic clips] (Midjourney and Kling)
** [https://x.com/juliewdesign_/status/1888666757302263828 Crossing Paths]
** [https://x.com/weirdai_art/status/1888794894187041200 Miniature food]
** [https://x.com/CaptainHaHaa/status/1889573017745035463 Animals]
** [https://x.com/Kavanthekid/status/1889371011667144724 Star Wars - The Ghost's Apprentice (Fan Film)]
** [https://x.com/AngryTomtweets/status/1889768184716423573 Ray2 image-to-video examples]
** [https://x.com/weirdai_art/status/1889890470987518069 New Horizons] (miniatures going to Mars)
** [https://x.com/karim_yourself/status/1890100168378536155 Black Sun (trailer)]
** [https://x.com/BrivaelLp/status/1890122101153231288 AI avatars] ([https://www.argil.ai/ Argil AI])
** [https://x.com/mrjonfinger/status/1890783411679236473 Adding elements to real video] ([https://x.com/mrjonfinger/status/1891337081923772918 other example])
** [https://x.com/SynthReveries/status/1892278954137940289 Glitch]
** Anime: [https://x.com/freeeebird2300/status/1889119007707689146 sci-fi] (Ray2), [https://x.com/Artedeingenio/status/1891173784188756069 sci-fi] (Ray 2), [https://x.com/seiiiiiiiiiiru/status/1890980673743474931 90s sci-fi] (Luma) and [https://x.com/TomLikesRobots/status/1891209369804591447 moody] (Midjourney and Ray2)
* February 2025: Meta [https://hila-chefer.github.io/videojam-paper.github.io/ VideoJAM]
* February 2025: ByteDance [https://omnihuman-lab.github.io/ OmniHuman-1]
* February 2025: ByteDance [https://saiyan-world.github.io/goku/ Goku] ([https://arxiv.org/abs/2502.04896 paper], [https://x.com/ai_for_success/status/1888821141495844991 examples])
* February 2025: [https://huggingface.co/stepfun-ai/stepvideo-t2v Step-Video-T2V] open-source model ([https://arxiv.org/abs/2502.10248 paper], [https://github.com/stepfun-ai/Step-Video-T2V code], [https://yuewen.cn/videos demo], [https://x.com/ai_for_success/status/1891369136082854129 examples])
* February 2025: Pika [https://x.com/pika_labs/status/1892620122818294109 Pikaswaps] (examples of [https://x.com/FreddyChavezO/status/1892678426487881805 modifying regions], [https://x.com/CharaspowerAI/status/1893216710141919637 swapping items])
* February 2025: Alibaba [https://wanai.pro/ Wan 2.1] [https://huggingface.co/blog/LLMhacker/wanai-wan21 open-source] ([https://x.com/fofrAI/status/1894862403260596371 examples])
* February 2025: [https://thetwinai.com/ Twin AI]: compose videos with provided character, object, location ([https://x.com/EHuanglu/status/1901277394729930984 example])
* February 2025: Examples:
** [https://x.com/mrjonfinger/status/1893109598627750164 Infected] (Pika swaps and additions)
** [https://x.com/amli_art/status/1893447314913796253 Hostile Government Takeover] (Veo2)
** [https://x.com/KarolineGeorges/status/1895226395812561399 Dual Mechanism] (Pikaframes 2.2)

====March 2025====
* March 2025: Examples:
** [https://x.com/SynthReveries/status/1895826068617252901 Doors] (music video)
** [https://x.com/bind_lux/status/1894492032414224792 Drum and Bass] (music video; Kling, audio from [https://www.riffusion.com/?filter=staff-picks Riffusion])
** [https://x.com/RileyRalmuto/status/1896088776151269523 Woman's face] (Sora)
** [https://x.com/ryanwpatterson/status/1896968881731948844 Skating] (Ray2)
** [https://www.threads.net/@evolving.ai/post/DGlRyRoO7c9?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Filming commercial on Mars]
** [https://www.threads.net/@evolving.ai/post/DGycqyhuETS?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Original Source commercial] (AI and real footage)
** [https://x.com/maxescu/status/1896926229204496788 Time-lapses] (Pika 2.2)
** [https://www.youtube.com/watch?v=2RhkcJyhg0E Hallucination]
** [https://x.com/town_in_new/status/1897354572139782620 Macro video of bubbles]
* March 2025: [https://github.com/Tencent/HunyuanVideo-I2V HunyuanVideo-I2V] image-to-video
* March 2025: Google [https://x.com/labsdotgoogle/status/1897376700666626233 Whisk Animate] (based on Veo2, [https://x.com/maxescu/status/1902742535618888025 examples])
* March 2025: Examples:
** [https://x.com/jdp2oo/status/1897874927367160114 Recursion (horror)] (Kling)
** [https://x.com/blizaine/status/1897826177970028614 Will Smith Eating Spaghetti while Sitting Inside a Bag] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025])
** [https://x.com/mickmumpitz/status/1897979382687297697 Paper Jam] (Kling with custom workflows to enable precise control)
** [https://x.com/maxescu/status/1899155936645722216 Cinematic shots] (Google Whisk and Luma)
** [https://x.com/weirdai_art/status/1899631013002711409 Perfunctory Horizons]
** [https://x.com/maxescu/status/1900243840499368319 A Hard Winter]
** [https://x.com/RoyalKongz/status/1900315389139014074 Consistent character example]
** [https://x.com/maxescu/status/1900652266362650853 Anthropomorphic Animals]
** [https://x.com/kimmonismus/status/1900457543299727718 Realistic (influencer-style)]
** [https://x.com/SunoMusic/status/1900942410584043579 I Feel Cultured] (music video with surrealist vibes)
** [https://rodeo.club/post/0x30b45c56d62751D763D3B8bFe4D18c4BB65EDF2c/209 journey of utmost importance]
** [https://x.com/aiordieshow/status/1901930851127984291 Karen: Unleashed]
** [https://x.com/minchoi/status/1901783767364092232 Yarn Cat]
** [https://x.com/andyorsow/status/1901619535180091509 Ned's Wet Deli] (Runway)
** [https://www.youtube.com/watch?v=KVoiooE8C0c BOOTS], a.k.a. [https://x.com/RuairiRobinson/status/1902027217137484117 "Our enemies are cartoon monsters"] (music video based on poem by Rudyard Kipling; Veo2)
** Flying in a dream: [https://x.com/minchoi/status/1902197944826183864 1], [https://x.com/venturetwins/status/1901796679063626060 2]
** [https://x.com/jasonzada/status/1902129567659389443 Commercial for Mercedes-Benz and FYI Radio]
** [https://x.com/maxescu/status/1903108496666542562 Selfie video] (Luma)
** Podcasts: [https://www.reddit.com/r/singularity/comments/1jintit/rottenly_roasted_now_full_script_is_also_not/ Rottenly Roasted] and [https://www.reddit.com/r/aivideo/comments/1jerh56/worst_date_ever/ Worst Date Ever] [https://x.com/OriZilbershtein/status/1903503438744318002 (Imagen 3, Hedra, Elevenlabs, Topaz)]
** [https://x.com/DexploreArts/status/1903822122150986000 Ambience] (Midjourney, Luma)
** [https://x.com/TheoMediaAI/status/1904207679511572845 The Bridge] (2 minute short; Veo2)
** [https://x.com/peteromallet/status/1904268944992829462 Pulp Fiction] (Wan video editing)
** [https://x.com/madpencil_/status/1906765750624493650 Camera Controls] (Luma Ray2)
* March 2025: [https://www.hedra.com/ Hedra] [https://x.com/hedra_labs/status/1897699010632466469 Character 3]
* March 2025: [https://huggingface.co/hpcai-tech/Open-Sora-v2 Open Sora v2] ([https://github.com/hpcaitech/Open-Sora code])
* March 2025: Amazon Prime debuts [https://en.wikipedia.org/wiki/House_of_David_(TV_series) House of David], with special effects created by [https://www.thewonderproject.com/ Wonder Project] using a [https://x.com/PJaccetturo/status/1903126616831676792 combination of traditional and AI methods] (reportedly including Midjourney and Runway)
* March 2025: Examples:
** [https://x.com/PJaccetturo/status/1905151190872309907 What if Studio Ghibli directed Lord of the Rings?] (OpenAI GPT-4o in-context image generation, Kling)
** [https://x.com/ROHKI/status/1906039022662963269 RŌHKI]
** [https://x.com/iaveras/status/1906362437487534296 Why]
** [https://x.com/BrianRoemmele/status/1906476721236570508 Commercial for Puma] (research/test)
** [https://x.com/Salmaaboukarr/status/1906776503343325469 Commercial for KFC] (concept ad)
* March 2025: Runway ML [https://runwayml.com/research/introducing-runway-gen-4 Gen-4]
** [https://www.youtube.com/watch?v=c8IBmK7GZP8 The Lonely Little Flame]
** [https://www.youtube.com/watch?v=Z0P6qjMUl34&t=1s The Herd]
** [https://www.youtube.com/watch?v=9HzdNhOe09I The Retrieval]
** [https://www.youtube.com/watch?v=xEhgxhrAjE4 NYC is a Zoo]
** [https://www.youtube.com/watch?v=ENGKp5wn344 Scimmia Vede] (music video)
** More examples: [https://x.com/techhalla/status/1906807994009993473 various], [https://x.com/c_valenzuelab/status/1907958530369372541 art direction], [https://x.com/c_valenzuelab/status/1908146364741029998 mannequins], [https://x.com/c_valenzuelab/status/1907921566643732612 taxi], [https://x.com/c_valenzuelab/status/1907432109695717798 small things], [https://x.com/c_valenzuelab/status/1907563448902496362 long shot (1m)]

====April 2025====
* April 2025: Examples:
** [https://x.com/AzeAlter/status/1906974768705990794 Age of Beyond]
** [https://x.com/techhalla/status/1907790675057242319 Commercial for Coca-Cola] (Higgsfield)
** [https://www.reddit.com/r/StableDiffusion/comments/1jr6j11/comment/mle9bq5/?context=3 Anime scene (3m)] (Wan 2.1 with LoRa)
** [https://x.com/pika_labs/status/1908263310912610401 Taxes then Death] (Pika multikeyframe)
* April 2025: [https://www.krea.ai/ Krea] [https://x.com/krea_ai/status/1907829389452021853 Video Re-Style]
* April 2025: ByteDance [https://grisoon.github.io/DreamActor-M1/ DreamActor-M1] performance transfer
* April 2025: Examples:
** [https://x.com/Diesol/status/1908535493673050403 Mercs] (Midjourney v7, Ray2)
** [https://x.com/minchoi/status/1909078846126649440 Cat at theme park]
** [https://x.com/c_valenzuelab/status/1909630883218207036 Timelapse history] (Runway Gen4)
** [https://x.com/EHuanglu/status/1909660808973533225 Examples for use in advertising]
** [https://x.com/arohaAIX/status/1910688361221599361 Sci-fi scapes]
** [https://x.com/PJaccetturo/status/1910750148055146708 Avα]
** [https://x.com/imagineFERA/status/1910601934207152576 The Bureau]
** [https://x.com/jasonzada/status/1911812014059733041 Beaver and Sock (3m)]
** [https://x.com/Delachica_/status/1911842237622735052 Organic Waste (5m)] (Runway)
** [https://x.com/c_valenzuelab/status/1912260798270882104 Fly] (Runway Gen4)
* April 2025: Alibaba [https://arxiv.org/abs/2504.04842 FantasyTalking] lipsync ([https://arxiv.org/abs/2504.04842 paper], [https://x.com/EHuanglu/status/1910341110322577442 examples])
* April 2025: Tencent Hunyuan [https://arxiv.org/abs/2411.16331 Sonic] image animation/lipsync to audio ([https://x.com/ai_for_success/status/1911719866958286864 examples])
* April 2025: ByteDance [https://huggingface.co/papers/2504.08685 Seaweed-7B] ([https://arxiv.org/abs/2504.08685 preprint], [https://www.youtube.com/watch?v=OaPI6K2y3rI examples])
* April 2025: [https://app.klingai.com/global/release-notes Kling 2.0] ([https://www.youtube.com/watch?v=Yqvh3M12T_M video])
* April 2025: [https://www.skyreels.ai/home Skyworks] [https://github.com/SkyworkAI/SkyReels-V2 SkyReels V2] (open-source, unlimited extension; [https://x.com/AngryTomtweets/status/1914270477482443142 examples])
* April 2025: [https://sand.ai/ Sand AI] [https://huggingface.co/sand-ai/MAGI-1 Magi-1] (open source, unlimited extension; [https://x.com/AngryTomtweets/status/1914318743578296506 examples], [https://x.com/dreamingtulpa/status/1916035289300275372 more examples])
* April 2025: Examples:
** [https://x.com/maxescu/status/1912100029549994016 Mars 2035 (3m)] (Kling 2.0)
** [https://x.com/ai_for_success/status/1912466999147450600 Kingdom (dragon battle, 3m)]
** [https://x.com/imagineFERA/status/1913156296657756278 Reflection (3m)] (Gen4)
** [https://x.com/Wytsekoetse/status/1913547157493162035 Pizza Galaxy (1m)] (MJ and Gen4)
** [https://www.youtube.com/watch?v=rseqmSGH7xk Snoop Dogg music video: Last Dance with Mary Jane] (blend of traditional and AI effects)
** [https://x.com/dreamingtulpa/status/1915104310448501129 Realistic human motion]
** [https://x.com/KarolineGeorges/status/1915113151546396893 Inception loop] (Gen4)
** [https://x.com/rayisdoingfilm/status/1916468807435952330 Tuesday (1m)] (Gen4)
** [https://www.youtube.com/watch?v=XWdwF1q3kDw Deus in Machina Automata (4m)] (Gen4)
** [https://x.com/machina9000/status/1915090908850049223 Outsiders (3m music video)]

====May 2025====
* May 2025: [https://huggingface.co/Lightricks/LTX-Video LTX-Video 13B] ([https://github.com/Lightricks/LTX-Video code], [https://x.com/maxescu/status/1919801813987164527 examples], [https://x.com/cubiq/status/1919748210567815551 more examples])
* May 2025: HeyGen Avatar IV (examples: [https://x.com/StevieMac03/status/1919910677860216869 sci-fi], [https://x.com/KarolineGeorges/status/1919801983143211222 Come Closer], [https://x.com/maxescu/status/1920410329454100973 singing], [https://x.com/minchoi/status/1920853859171234165 various])
* May 2025: Tencent [https://hunyuancustom.github.io/ HunyuanCustom]
* May 2025: Examples:
** [https://x.com/lifeofc/status/1920331476157280413 Iris (1.5m)] (Midjourney, Luma, Runway)
** [https://runwayml.com/customers/the-making-of-mars-and-siv Mars and Siv: "No Vacancy" (episode 1, 6m)] (Runway)
** [https://x.com/cfryant/status/1921317318744760817 Go to the East Wing] (dreamlike, Luma)
** [https://x.com/DeryaTR_/status/1921015340827304389 Yu Lanter showreel] (Higgsfield)
** [https://x.com/freeeebird2300/status/1921789387614134652 Cyberpunk anime] (Luma)
** [https://x.com/LittleTinRobot/status/1921692735930589246 Alien animals] (Runway)
** [https://x.com/minchoi/status/1922500563792486878 America's Funniest AI Home Videos (3m)]
** [https://x.com/c_valenzuelab/status/1924204409833103365 Editing POV shots from AR glasses] (Runway)
* May 2025: [https://runwayml.com/gen48 Gen:48] Fourth Edition winners:
** [https://www.youtube.com/watch?v=NphCYRXjqTI&t=174s Home] (3m)
** [https://www.youtube.com/watch?v=L2DQwCp_DCw The King's Secret] (2m)
* May 2025: [https://viggle.ai/home Viggle] Live [https://x.com/ViggleAI/status/1926324953038627214 enables] real-time avatar control
* May 2025: Google [https://blog.google/technology/ai/generative-media-models-io-2025/ Veo 3] (examples: [https://x.com/babaeizadeh/status/1924942128851124284 conversation], [https://x.com/mattshumer_/status/1925039973310308424 cooking], [https://x.com/jerrod_lew/status/1924934440486371589 singing], [https://x.com/MartinNebelong/status/1924926779677905014 simple story], [https://x.com/Diesol/status/1925114473544913004 cinematic action sequence], [https://x.com/laszlogaal_/status/1925094336200573225 car show interviews], [https://x.com/arikuschnir/status/1924953349943697763 We Can Talk], [https://x.com/venturetwins/status/1925021235530105298 podcat], [https://x.com/maxescu/status/1925079990061957423 various], [https://x.com/jerrod_lew/status/1927092379892265139 camera moves])
* May 2025: Examples:
** [https://x.com/javilopen/status/1925495026903380358 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025])
** [https://x.com/MetaPuppet/status/1926659557914268155 Bob from Marketing] (Veo 3)
** [https://x.com/dustinhollywood/status/1926733069475565622 He is King (16m)]
** [https://x.com/HashemGhaili/status/1925616536791760987 Prompt Theory], [https://x.com/HashemGhaili/status/1925332319604257203 part 2], [https://x.com/HashemGhaili/status/1927467022213869975 Afterlife (3m)] (Veo3)
** [https://x.com/JoannaStern/status/1927856754873835747 My Robot and Me (3m)] (Veo, Runway)
** [https://x.com/rohanpaul_ai/status/1928152398930817238 The Internet's Over] (Veo3)
** [https://www.reddit.com/r/aivideo/comments/1l0rl7d/before_colours_fade/ Before Colours Fade (2m)] (Midjourney, Kling)

====June 2025====
* June 2025: Examples:
** [https://x.com/amasad/status/1930505292904837132 Bigfoot ASMR]
** [https://x.com/minchoi/status/1930670583605514333 Talking] (HeyGen Avatar IV upgrade)
** [https://x.com/ROHKI/status/1931081752992477285 Where are all the aliens? (2m)]
** [https://x.com/fofrAI/status/1930999540770893874 Natural talking]
** [https://x.com/ammaar/status/1931672722418851904 Elemental Showdown - Mortal Kombat (3m)]
** [https://x.com/maxjoseph/status/1932104616021565476 It Starts at the End (music video, 4m)]
** [https://x.com/deedydas/status/1932105266654581116 Sci-fi trailer (2m)]
** [https://x.com/DrMachakil/status/1931816470901575924 The Prompt Floor (2m)]
** [https://x.com/DrMachakil/status/1853960062546366856 NALVORA (2.7m)] - [https://x.com/DrMachakil/status/1932904599004066200 Best Trailer, Metamorph AI Film Awards]
** [https://x.com/Kalshi/status/1932891608388681791 Commercial for Kalshi (30s)] - [https://x.com/PJaccetturo/status/1932893260399456513 to air during NBA finals] (Veo)
** [https://x.com/ROHKI/status/1933594430113788227 Your Brain is Broken on Purpose (2m)]
** [https://x.com/c_valenzuelab/status/1934312626021949687 Runway Gen-4 Reference examples]
** [https://x.com/JesusPlazaX/status/1934253813696786661 Paper airplane]
** [https://x.com/minchoi/status/1934032730947526872 Veo3 examples]
** [https://x.com/NomadsVagabonds/status/1935329331410075734 Reset 3 (1m, surreal)]
** [https://x.com/HashemGhaili/status/1935722105322323968 It Has No Soul (1m, Veo3)]
* June 2025: [https://seedance.net/seedance Seedance 1.0] ([https://arxiv.org/abs/2506.09113 preprint])
* June 2025: [https://hailuoai.video/ Hailuo AI] (MiniMax) Hailuo 02 ([https://x.com/venturetwins/status/1934236631336403344 "Kangaroo" during testing]; examples: [https://x.com/lepadphone/status/1935078910934626429 various], [https://x.com/alexgnewmedia/status/1935018186954719365 various], [https://x.com/FussyPastor/status/1935065068456263883 tsunami], [https://x.com/thedorbrothers/status/1935098802744213935 fight scene], [https://x.com/umesh_ai/status/1935028257708966231 fox running], [https://x.com/BrentLynch/status/1934979825636446268 blogger], [https://x.com/HalimAlrasihi/status/1935297126759538735 transitions], [https://x.com/MKMXLA/status/1938318951664280045 skateboarding])
* June 2025: Midjourney video ([https://x.com/minchoi/status/1934373051464057062 early examples], [https://x.com/ciguleva/status/1935386452197785892 various], [https://x.com/juliewdesign_/status/1935395999175876696 various], [https://x.com/emollick/status/1935504703023899096 Ethan Mollick], [https://x.com/PJaccetturo/status/1935383312392151528 highly rated], [https://x.com/maxescu/status/1935674561821126847 complex environments], [https://x.com/CoffeeVectors/status/1935863623076675875 manga])
* June 2025: Examples:
** [https://x.com/StevieMac03/status/1935768436556378170 The Battle of Glenvael - Orcs vs Humans] (Hailuo)
** [https://x.com/HashemGhaili/status/1935036744568824208 The Sentence (9m, Veo3)]
** [https://x.com/elder_plinius/status/1936145834585862225 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025])
** [https://x.com/venturetwins/status/1937232461576175809 Gymnastics] (Hailuo 02)
** [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI] (Veo3)
** [https://x.com/thedorbrothers/status/1937926400507580726 Vorex (2m trailer)]
** [https://x.com/OnerBiberkoku/status/1938972810321281394 Doğrucu (3m music video, Veo3)]
* June 2025: [https://higgsfield.ai/soul Higgsfield Soul] Video Effects ([https://x.com/higgsfield_ai/status/1937931727084917097 examples], [https://x.com/HashemGhaili/status/1938278903765995611 realism])
* June 2025: Alibaba [https://omni-avatar.github.io/ OmniAvatar] ([https://arxiv.org/abs/2506.18866 paper], [https://github.com/Omni-Avatar/OmniAvatar code], [https://huggingface.co/OmniAvatar/OmniAvatar-14B model], [https://x.com/AngryTomtweets/status/1939850674776547359 examples])

====July 2025====
* July 2025: Examples:
** [https://x.com/Kavanthekid/status/1940452444850589999 Untold - The Immortal Blades Saga] (2m trailer)
** [https://x.com/minchoi/status/1941234456461029584 Unofficial commercial for Liquid Death (1m)]
** [https://x.com/brain_racked/status/1942594951310893425 A parade of the chosen theocracy on Callisto]
** [https://x.com/Popeyes/status/1943316484404433182 Popeyes commercial - diss track (1m)]
*** [https://x.com/gabemichael_ai/status/1944070622155616668 (Unofficial) Wendy's response - diss track (2m)]
*** [https://x.com/ai_massive/status/1947689537641357618 (Unofficial) In-N-Out rap battle (3m)]
** [https://x.com/Kalshi/status/1943339616716599548 Kalshi commercial]
** Jonah (25m TV show, [https://x.com/PJaccetturo/status/1946101701548880029 making of], [https://kingstonestudios.uscreen.io/programs/jonah purchase here])
** [https://x.com/Totemko/status/1946243585021452335 Unofficial commercial for Mercedes (17s)]
** [https://x.com/CoffeeVectors/status/1946016960916889632 Skateboarding music video (1m)]
* July 2025: Runway ML [https://help.runwayml.com/hc/en-us/articles/42311337895827-Creating-with-Act-Two Act-Two] (video-to-video performance transfer)
* July 2025: Examples:
** Neural Viz [https://www.youtube.com/watch?v=juDDHvHroQ8 The Cop Files: Part VI (8m)]
** [https://x.com/Kavanthekid/status/1947696716981145971 Perfect Dark - Concept Trailer (1.5m)]
** [https://x.com/StelfieTT/status/1948753090858885131 Exodus (2m trailer)]
** [https://x.com/Jett_Collective/status/1949140450553540841 A Walk Together - Life and love in motion (1m, Midjourney Video)]
* July 2025: Netflix sci-fi show [https://en.wikipedia.org/wiki/The_Eternaut_(TV_series) The Eternaut] [https://x.com/omooretweets/status/1946290797399400662 used genAI] for a particular scene (building collapse)
* July 2025: Google Veo [https://x.com/GoogleLabs/status/1948477692715700718 emergent annotation direction] ([https://x.com/venturetwins/status/1948771505783144641 example], [https://x.com/bilawalsidhu/status/1948844167603310660 example], [https://x.com/jboogx_creative/status/1949230927504371765 example], [https://x.com/Ror_Fly/status/1949606017739747625 example])
* July 2025: Runway [https://runwayml.com/research/introducing-runway-aleph Aleph] contextual editing
* July 2025: Wan 2.2 (open source, [https://x.com/Alibaba_Wan/status/1949804551655276989 examples])
====August 2025====
* August 2025: Pika [https://x.com/pika_labs/status/1954935844936024476 audio-driven performance] ([https://x.com/minchoi/status/1954989794129514937 examples], [https://x.com/pika_labs/status/1955007656302924192 examples])
* August 2025: Examples:
** [https://www.youtube.com/watch?v=gePD1Hf1qPc Eve and Adam] (8m, [https://x.com/MetaPuppet/status/1954254544935719259 multiple tools])
** [https://x.com/runwayml/status/1955615613583519917 Redesign a space] (Runway Aleph)
** [https://x.com/theGioM/status/1955656398248763428 Detroit Pretend Work Park (1m)]
** [https://x.com/pzf_ai/status/1940816374211006600 The Weight of Light] (3m music video, Midjourney & Suno)
** [https://x.com/EHuanglu/status/1956788759778967710 Commercial for Pepsi]
** [https://x.com/StelfieTT/status/1956633450326200426 Emotion]
** [https://x.com/dustinhollywood/status/1957940749862875383 TZIGANE]
** [https://x.com/0xFramer/status/1960720090921623636 Anime chase sequence] (Nano Banana and Seedance 1.0)
* August 2025: ByteDance [http://www.waver.video/ Waver 1.0]
* August 2025: [https://huggingface.co/Wan-AI/Wan2.2-S2V-14B Wan2.2-S2V 14B]

====September 2025====
* September 2025: [https://www.wsj.com/tech/ai/openai-backs-ai-made-animated-feature-film-389f70b0 OpenAI Backs AI-Made Animated Feature Film: Film, called ‘Critterz,’ aims to debut at Cannes Film Festival and will leverage startup’s AI tools and resources.]
* September 2025: Examples:
** [https://x.com/kentskooking/status/1964606423037542459 A loop to wake up to (30s)]
** [https://x.com/venturetwins/status/1966570512991350907 time lapse]
** [https://x.com/NeuralViz/status/1967391198487994652 The Adventures of Reemo Green] (11m, Neural Viz)
** [https://x.com/kellyeld/status/1967620786166079545 Surreal DJs music video (2m)]
** [https://x.com/dustinhollywood/status/1968724784440558044 Glass City] (Hailuo)
** [https://x.com/TheoMediaAI/status/1968646951227777529 Alarm] (1m, multiple tools including world synthesis for consistent environments)
* September 2025: [https://lumalabs.ai/ray Luma] [https://x.com/LumaLabsAI/status/1968684330034606372 Ray3] ([https://x.com/cfryant/status/1968692370725077251 example])
* September 2025: Examples:
** [https://x.com/mrjonfinger/status/1968687352382910469 Stop motion interpolation] (Luma Ray3)
** [https://x.com/heydin_ai/status/1969514789169959128 Skyland] (1.5m, various tools)
** [https://x.com/iamluokai/status/1970185972076925427 Dancing] (Wan 2.2)
** [https://x.com/c_valenzuelab/status/1970497214108815584 Under Armor commercial] (Runway Aleph)
** [https://x.com/FilmsBySav/status/1971247214795358706 OG PRIME] (10m, Kling)
** [https://www.youtube.com/watch?v=JGLoTjxd-Ss PLANET] (37m)
* September 2025: [https://x.com/Kling_ai/status/1970439808901362155 Kling AI 2.5 Turbo] (examples: [https://x.com/OrctonAI/status/1970472214794220008 cyberpunk], [https://x.com/ImagineArt_X/status/1970586138655236565 human motion], [https://x.com/fAIkout/status/1970505756853334324 motion and emotion], [https://x.com/fAIkout/status/1970495039248965636 painting], [https://x.com/venturetwins/status/1970563820478439546 gymnastics], [https://x.com/Art_For_Joy/status/1970249516033970434 breakdancing], [https://x.com/HaydenLeeWrites/status/1970523610734567819 combat], [https://x.com/umesh_ai/status/1970497680536150454 cinematic], [https://x.com/LillyLiCT/status/1970580585073819752 horror camerawork], [https://x.com/StevieMac03/status/1970559778804908331 extended sequence])
* September 2025: OpenAI [https://openai.com/index/sora-2/ Sora 2] ([https://x.com/minchoi/status/1973949620318580970 examples])

====October 2025====
* October 2025: Examples:
** [https://x.com/minchoi/status/1976042197154963702 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025])
** [https://www.youtube.com/watch?v=JhH3uxcdM1M Frostbite] (3m, Sora 2)
** [https://x.com/Jukanlosreve/status/1977764418709758106 (Fake) "Behind the scenes" for a Chainsaw Man live action] ([https://x.com/PJaccetturo/status/1972705821072261402 others])
* October 2025: Google [https://blog.google/technology/ai/veo-updates-flow/ Veo 3.1]
* October 2025: Examples:
** [https://x.com/aisearchio/status/1978465562821898461 Will Smith Eating Spaghetti], Veo 3.1 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025])
** [https://x.com/Diesol/status/1978755688261128227 War footage] (Veo 3.1)
** [https://www.meta.ai/@dustin_hollywood/post/bG3BHB21W0l/yukon/ Yukon] (music video, [https://x.com/dustinhollywood/status/1982260655957700746 Dustin Hollywood])
** [https://x.com/Diesol/status/1980922041131028515 Bloom] (2m, Veo 3.1)
** [https://x.com/xmuse_/status/1982026008803905639 Auction] (1m)
** [https://x.com/kellyeld/status/1982425147496882287 Dancing] (music video; Midjourney, Suno, Veo3)
** [https://x.com/JesusPlazaX/status/1982393609069412433 Anime example] (Midjourney, Grok Imagine)
** [https://x.com/EccentrismArt/status/1982830100266783039 King Arthur] (1m)
** [https://x.com/venturetwins/status/1983024227352789162 Transitions] (1m music video)
** [https://x.com/eastflatsfilm/status/1984116704704971076 Unofficial commercial for Nike] (2m, Midjourney, Hailuo)
** [https://x.com/PJaccetturo/status/1984639281848336592 Loneliness/Halloween] ([https://www.linkedin.com/posts/simon-meyer-976339160_this-could-be-the-scariest-halloween-film-activity-7389892778144735232-6CYY?utm_source=share&utm_medium=member_desktop&rcm=ACoAAADeoqYBzX8N9-j_hRQvl1e7OUlOgFptNF0 1.5m])
** [https://www.youtube.com/watch?v=43h61QAXjpY Wave] (2m music video, [https://x.com/MIZNOM Masaki Mizuno])
* October 2025: [https://x.com/Hailuo_AI/status/1983016390878708131 Hailuo 2.3]

====November 2025====
* November 2025: Examples:
** [https://x.com/subverum/status/1985069550250107033 Valley of Shadow] (6m)
** [https://x.com/DiscussingFilm/status/1985470088074375344 Coca-cola ad] (c.f. [https://x.com/techhalla/status/1857462526859935813 2024 ad])
** [https://x.com/venturetwins/status/1985755546222542903 France 2026 Olympics ad] (blend of genAI and traditional methods, [https://x.com/venturetwins/status/1985753512362590439 behind the scenes])
** [https://x.com/NeuralViz/status/1986611025366687754 Minnesota Nice] (3m, [https://x.com/NeuralViz Neural Viz])
** [https://x.com/machina9000/status/1986563727873740934 Brutalis] (7m)
** [https://x.com/tastypxls/status/1987312755485876502?s=20 Living The Dream - Rynn] (music video, 1m)
** [https://x.com/MrDavids1/status/1988366387111170339?s=20 Environment as Character]
** [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight]
** [https://x.com/LumaLabsAI/status/1989013731267998172?s=20 Overclock] (30s, Luma)
** [https://x.com/venturetwins/status/1980685301577326994?s=20 Music video] (30s, Wan Animate)
** [https://x.com/venturetwins/status/1990227418553209259?s=20 Promotional material for Pudong Art Museum - Louvre exhibition in Shanghai] (1m)
** [https://x.com/Kyrannio/status/1990324648488186358?s=20 Loop 87 A Temporal Heist] (12m, claim that video was generated fully autonomously using AI agent NoSpoon)
** [https://x.com/AzeAlter/status/1906974768705990794?s=20 Age of Beyond] (3m)
** [https://x.com/c_valenzuelab/status/1991245088446386495?s=20 Ausencia] (5m)
** [https://x.com/AngryTomtweets/status/1993047608617517246?s=20 live paintings] ([https://www.youtube.com/channel/UCw8kc0wDm5Bh6g9iZzEWfOg bandyquantguy] on YouTube)
** [https://x.com/BrianRoemmele/status/1994625579073900804?s=20 Michelle, on a server in Iowa] (1m)
* November 2025: [https://odyssey.ml/ Odyssey] - [https://x.com/odysseyml/status/1994873514579697830?s=20 Odyssey-2]

====December 2025====
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://app.klingai.com/global/all-tools Kling] [https://app.klingai.com/global/omni/new O1] ([https://x.com/minchoi/status/1995523379957559609?s=20 examples], [https://x.com/TheoMediaAI/status/1995517613414518987?s=20 other examples]) and Kling 2.6.
* December 2025: [https://app.pixverse.ai/onboard PixVerse v5.5]
* December 2025: Examples:
** [https://x.com/EHuanglu/status/1996649596119068687?s=20 Will Smith Eating Spaghetti], Kling 2.6 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025])
** [https://x.com/venturetwins/status/1997898095670296615?s=20 Dreamlike POV]
** [https://x.com/chatgpt21/status/1998253809307455555?s=20 McDonalds commercial]
** [https://x.com/EHuanglu/status/1998039554402750545?s=20 Skittles commercial] (Higgsfield)
** [https://x.com/Diesol/status/1997147919603077335?s=20 The Tenant] (2m, Kling 2.6)
** [https://x.com/PsyopAnime/status/1999242965659906526?s=20 Maximum Carnage] (3m)
** [https://x.com/JeffSynthesized/status/1998786836924395875?s=20 Blurred Horizon: Episode 1] (24m)
** [https://x.com/Artedeingenio/status/2001667487784460301?s=20 Anime Action] (2m)
** [https://x.com/bearlyai/status/2005055231617605748?s=20 Dollar Shave Club commercial] (1m)
** [https://x.com/EHuanglu/status/2004020543084024295?s=20 Xmas Cameos] (1.5m)
** [https://x.com/DiDi_OKK/status/1955653520407019976?s=20 Green Screen] (2m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/DiDi_OKK/status/1998227601341702639?s=20 Arrow] (7m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/bluehorizon_ai/status/2004045348579561503?s=20 Live Action One Punch Man | Saitama vs Genos] (2m, [https://x.com/bluehorizon_ai Blue Horizon])
** [https://x.com/keshiAIart/status/2005254907780358201?s=20 Anime Train] (6s)
** [https://x.com/venturetwins/status/2006051632837189683?s=20 Michael Catson] (13s)
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://arxiv.org/abs/2512.13507 Seedance 1.5]

===2026===
====January 2026====
* January 2026: Examples:
** [https://x.com/Itspedrito/status/2007636967048228968?s=20 Somebody That I Used to Know] (1m)
** [https://x.com/hujimari/status/2008054519704461407?s=20 Cat being disruptive at night], [https://x.com/klara_sjo/status/2007864014521720963?s=20 another], [https://x.com/alphafox/status/2009732284375830687?s=20 another] (c.f. [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight])
** [https://x.com/Uncanny_Harry/status/2008881579095961934?s=20 Character test] (30s, Kling 2.6 Motion Control, [https://x.com/Uncanny_Harry Uncanny Harry AI])
** [https://www.youtube.com/watch?v=SGJC4Hnz3m0&t=2s STAR WARS: Beggar’s Canyon | A Luke Skywalker Fan Film (Between ESB & ROTJ)] (7m)
** [https://x.com/dustinhollywood/status/2009732705299104118?s=20 TZIGANE] (9m)
** [https://x.com/Framer_X/status/2011075884246061454?s=20 The Subway Spark] (Anime, 45s)
** [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025])
** [https://x.com/PJaccetturo/status/2013675665539596651?s=20 The AI Artist] (1.5m)
** [https://x.com/Artedeingenio/status/2013624842021417030?s=20 Sci-fi action anime] (2m)
** [https://x.com/verbalriotshow/status/2014752509240475872?s=20 Stone Hand] (fake trailer, 1m)
* January 2026: [https://x.com/nvidia/status/2008346949301235933?s=20 Runway Gen-4.5 on] [https://www.nvidia.com/en-us/data-center/technologies/rubin/?linkId=100000401190502 Nvidia Rubin] ([https://x.com/runwayml/status/2014406560445771804?s=20 examples])
* January 2026: [https://ltx.io/model/ltx-2 LTX-2] open source video model (20s, 4k, w/ audio; [https://x.com/venturetwins/status/2010878914273697956?s=20 examples])
* January 2026: Luma [https://lumalabs.ai/blog/news/ray3_14 Ray3.14] ([https://x.com/LumaLabsAI/status/2015822842575888844?s=20 examples])
* January 2026: Examples:
** [https://x.com/pressmanc/status/2015099516500758647?s=20 Runway Gen-4.5 tests] (3.5m)
** [https://x.com/EHuanglu/status/2015573517618528538?s=20 Longchamp / Horses in the city] (1m)
** [https://x.com/dustinhollywood/status/2008154825385521418?s=20 The Last Artist] (trailer, 2m)
** [https://x.com/taziku_co/status/2015739943101047111?s=20 Monet temporal structure] (3m)
** [https://x.com/runwayml/status/2016155967285543364?s=20 Grizzlies] (1.5m, Runway Gen-4.5)
** [https://www.youtube.com/@TIME/videos On This Day... 1776] ([https://www.youtube.com/watch?v=E4cLKIxt8W8 trailer])
*** [https://www.youtube.com/watch?v=sV52AUVGc6I January 1: The Flag] (3.5m)
*** [https://www.youtube.com/watch?v=3ZDnL_a0YfQ January 10: Common Sense] (4.5m)
*** [https://www.youtube.com/watch?v=J5b1TiyKTus January 26: The Guns of Ticonderoga] (4m)

====February 2026====
* February 2026: [https://app.klingai.com/global/quickstart/klingai-video-3-omni-model-user-guide Kling 3.0]
* February 2026: [https://seedance2.ai/ Seedance 2.0] ([https://x.com/EHuanglu/status/2020131622675202512?s=20 example 1], [https://x.com/EHuanglu/status/2020492770872566053?s=20 2], [https://x.com/dynamicwangs/status/2020054894741451123?s=20 3], [https://x.com/patrickassale/status/2020180495900848470?s=20 4], [https://x.com/janekm/status/2020888750285332526?s=20 5], [https://x.com/Dork_sense/status/2020179955511116082?s=20 6], [https://x.com/EHuanglu/status/2020388244802740728?s=20 7], [https://x.com/zhao_dashuai/status/2020528048341217592?s=20 8], [https://x.com/AngryTomtweets/status/2020784886932738470?s=20 9], [https://x.com/javilopen/status/2020558352590287298?s=20 10], [https://x.com/linxiaobei888/status/2021399630672691710?s=20 11])
* February 2026: Examples:
** [https://x.com/PJaccetturo/status/2019072637192843463?s=20 Unofficial opening sequence for The Way of Kings by Brandon Sanderson] (1.5m, Kling 3)
** [https://x.com/dailycatsclips/status/2020117502915989680?s=20 Cat Dreams] (1.5m)
** [https://x.com/DotCSV/status/2021269435567218725?s=20 Will Smith Eating Spaghetti] (Seedance 2.0) (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025], [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ January 2026], [https://x.com/SpecialSitsNews/status/2020583709741883666?s=20 progression to 2026])
** [https://x.com/thedorbrothers/status/2023460644905742577?s=20 To Be Continued] (3m, [https://x.com/thedorbrothers The Dor Brothers])
** [https://x.com/ivanka_humeniuk/status/2023711181978919034?s=20 Crow - Game of Thrones] (1m)
** [https://x.com/billyrestey/status/2024193251763507528?s=20 Reboot] (2m)
** [https://x.com/kenw_2/status/2024625510534283508?s=20 Late for work] (1.5m, MJ NBP Seedance 2.0)
** [https://x.com/heydin_ai/status/2024616890338079181?s=20 AI Man] (4.5m, MJ NBP Seedance 2.0)
** [https://x.com/maxescu/status/2024882372836250033?s=20 But AI Will Never Be Able To Do This] (3m, Seedance 2.0)
** [https://x.com/DiDi_OKK/status/2018784243753599093?s=20 Sign] (8m)
** [https://x.com/LTXStudio/status/2025994426309640291?s=20 Commercial for Nexus] (1m)
** [https://x.com/maxescu/status/2026007558159278477?s=20 Showcase] (9m, [https://x.com/maxescu Alex Patrascu])
** [https://x.com/EHuanglu/status/2025410944512192536?s=20 Painterly] (30s, [https://x.com/EHuanglu el.cine])
** [https://x.com/kellyeld/status/2025975677657440267?s=20 Imposter Syndrone] (2m, music video)
** [https://www.youtube.com/watch?v=nKnE2Wn1VNQ All Is Conscious] (3.5m)
** [https://x.com/CuriousRefuge/status/2026086576191934769?s=20 Emotional argument] (3m, Seedance 2.0)
** [https://x.com/jdkanani/status/2023781028368884031?s=20 Moonlight Veil] (10m)

====March 2026====
* March 2026: Examples:
** [https://x.com/jacopo_reale/status/2029909372764041559 Looking for Bianca] (6m, Kling 3.0)
** [https://x.com/sumiturkude007/status/2030933543443193908?s=20 Gardener] (3m, Seedance 2.0)
** Micro-movie (Chinese): [https://x.com/yyyole/status/2029225419669684418?s=20 episode 1], [https://x.com/yyyole/status/2030850450464112675?s=20 episode 2]
** Live-action Evangelion: [https://x.com/NACHOS2D_/status/2032401289653461052?s=20 part 1] (4.5m), [https://x.com/NACHOS2D_/status/2032778868361203770?s=20 part 2] (3.5m), [https://x.com/NACHOS2D_/status/2033126071151837491?s=20 part 3] (2.5m)
** [https://x.com/lexx_aura/status/2033589846216741293?s=20 to love Wu Yong] (5m)
** [https://x.com/Alterverse_AI/status/2036434608137343111?s=20 Monkey's Paw] (5m)
** [https://x.com/maxescu/status/2036434854435315868?s=20 Cinematic scenes] (3.5m, comedy, [https://lumalabs.ai/uni-1 Luma Uni-1 Agent])
* March 2026: [https://higgsfield.ai/original-series Higgsfield Original Series]
* March 2026: [https://app.pixverse.ai/onboard Pixverse v6] ([https://x.com/fal/status/2038655807483490613?s=20 example])

====April 2026====
* April 2026: Examples:
** [https://x.com/aiordieshow/status/2039679896650125391?s=20 Soothent Paste] (45s)
** [https://x.com/heydin_ai/status/2040342454193516761?s=20 NEXII] (2m, music video, Seedance 2.0)

AI and Humans

2026-04-05T00:57:07Z

KevinYager: /* AI improves human work */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6513481 Mapping AI into Production: A Field Experiment on Firm Performance]

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]
* 2026-03: Anthropic: [https://www.anthropic.com/features/81k-interviews What 81,000 people want from AI]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://www.nature.com/articles/s41562-025-02194-6 On the conversational persuasiveness of GPT-4]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
** 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2026-03: [https://arxiv.org/abs/2603.15245 Practicing with Language Models Cultivates Human Empathic Communication]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

Talk:AI video

2026-04-04T23:28:37Z

KevinYager: /* Others for Consideration */

=Others for Consideration=
* May 2025: Examples:
** [https://x.com/thedorbrothers/status/1927061347331694973 Influenders] ([https://x.com/thedorbrothers The Dor Brothers])
* June 2025: Examples:
** [https://x.com/venturetwins/status/1934027410841764221 Koala shot by protesters]
** [https://x.com/thedorbrothers/status/1932835386557939913 Riot] ([https://x.com/thedorbrothers The Dor Brothers])
** Celebrity explainer [https://x.com/venturetwins/status/1934434222523171000 1], [https://x.com/venturetwins/status/1934438139738874129 2]
* July 2025: Examples:
** [https://x.com/IamEmily2050/status/1945795374251479388 Quick rap] (example JSON format)
** [https://x.com/sweeneydailyx/status/1948032121429500221 Commercial for American Eagle (20s)] (the car driving off is an AI extension of the clip)
* August 2025: Examples:
** [https://x.com/thedorbrothers/status/1955305090971017653 Waidmanns Heil ([https://x.com/thedorbrothers The Dor Brothers])
* December 2025: Examples:
** [https://x.com/Gossip_Goblin/status/1996994382428336165?s=20 Joy Loop] (1.5m)
** [https://x.com/TUPACABRA2/status/2005877025454662066?s=20 Minnesota Dark] (2m, [https://x.com/TUPACABRA2 Tupacrabra])
* January 2026: Examples:
** [https://x.com/AngryTomtweets/status/2008990455661515071?s=20 Egg Protein] (2m)
* March 2026: Examples:
** [https://x.com/aimikoda/status/2038285542727487827?s=20 Fashion sequence] (15s, Seedance 2.0)
* April 2026: Examples:
** [https://x.com/ganziboy11/status/2040413277122068781?s=20 Zephyr] (2.5m, Higgsfield Seedance 2.0)

AI video

2026-04-02T18:39:50Z

KevinYager: /* March 2026 */

AI Agents

2026-04-02T17:55:33Z

KevinYager: /* Automated Improvement */

=Reviews & Perspectives=
===Published===
* 2024-04: [https://arxiv.org/abs/2404.05221 LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models] ([https://github.com/maitrix-org/llm-reasoners code])
* 2024-08: [https://arxiv.org/abs/2408.02479 From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future]
* 2024-09: [https://doi.org/10.1039/D4DD00178H Towards a Science Exocortex]
* 2024-09: [https://www.arxiv.org/abs/2409.02977 Large Language Model-Based Agents for Software Engineering: A Survey]
* 2024-09: [https://arxiv.org/abs/2409.09030 Agents in Software Engineering: Survey, Landscape, and Vision]
* 2025-04: [https://arxiv.org/abs/2504.01990 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems]
* 2025-04: [https://arxiv.org/abs/2503.19213 A Survey of Large Language Model Agents for Question Answering]
* 2025-04: [https://arxiv.org/abs/2504.09037 A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems]
* 2025-04: [https://arxiv.org/abs/2504.01990 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems]
* 2026-01: [https://arxiv.org/abs/2601.12538 Agentic Reasoning for Large Language Models]

===Continually updating===
* [https://github.com/open-thought/system-2-research OpenThought - System 2 Research Links]
* [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1): Collection of research papers & blogs for OpenAI Strawberry(o1) and Reasoning]
* [https://github.com/e2b-dev/awesome-ai-agents Awesome AI Agents]

===Analysis/Opinions===
* [https://arxiv.org/abs/2402.01817v3 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks]
* [https://rasa.com/blog/cutting-ai-assistant-costs-the-power-of-enhancing-llms-with-business/ Cutting AI Assistant Costs by Up to 77.8%: The Power of Enhancing LLMs with Business Logic]
* 2025-05: [https://arxiv.org/abs/2505.10468 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges]

===Guides===
* Anthropic: [https://www.anthropic.com/research/building-effective-agents Building Effective Agents]
* Google: [https://www.kaggle.com/whitepaper-agents Agents] and [https://www.kaggle.com/whitepaper-agent-companion Agents Companion]
* OpenAI: [https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf A practical guide to building agents]
* Anthropic: [https://www.anthropic.com/engineering/claude-code-best-practices Claude Code: Best practices for agentic coding]
* Anthropic: [https://www.anthropic.com/engineering/built-multi-agent-research-system How we built our multi-agent research system]

=AI Assistants=

==Components of AI Assistants==

===Agent Internal Workflow Management===
* [https://github.com/langchain-ai/langchain LangChain]
* [https://github.com/pydantic/pydantic-ai Pydantic: Agent Framework / shim to use Pydantic with LLMs]
* [https://github.com/lmnr-ai/flow Flow: A lightweight task engine for building AI agents that prioritizes simplicity and flexibility]
* [https://llama-stack.readthedocs.io/en/latest/index.html llama-stack]
* [https://huggingface.co/blog/smolagents Huggingface] [https://github.com/huggingface/smolagents smolagents]
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
* [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
* [https://github.com/coze-dev/coze-studio Coze]: All-in-one AI agent development tool

===Information Retrieval (Memory)===
* See also [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|RAG]].
* 2024-04: [https://arxiv.org/abs/2404.13501 A Survey on the Memory Mechanism of Large Language Model based Agents]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10: [https://arxiv.org/abs/2410.09713 Agentic Information Retrieval]
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
* [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
* 2025-08: [https://arxiv.org/abs/2508.16153 Memento: Fine-tuning LLM Agents without Fine-tuning LLMs]

===Contextual Memory===
* [https://github.com/memodb-io/memobase Memobase]: user profile-based memory (long-term user memory for genAI) applications)

===Control (tool-use, computer use, etc.)===
* See also: [[Human_Computer_Interaction#AI_Computer_Use]]
* [https://tavily.com/ Tavily]: Connect Your LLM to the Web: Empowering your AI applications with real-time, accurate search results tailored for LLMs and RAG
===Model Context Protocol (MCP)===
* '''Standards:'''
*# Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
*# [https://openai.github.io/openai-agents-python/mcp/ OpenAI Agents SDK]
* '''Tools:'''
** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
** [https://github.com/fleuristes/fleur/ Fleur]: A desktop app marketplace for Claude Desktop
* '''Servers:'''
** '''Lists:'''
**# [https://github.com/modelcontextprotocol/servers Model Context Protocol servers]
**# [https://www.mcpt.com/ MCP Servers, One Managed Registry]
**# [https://github.com/punkpeye/awesome-mcp-servers Awesome MCP Servers]
** '''Noteworthy:'''
**# Official [https://github.com/github/github-mcp-server Github MCP server]
**# Unofficial [https://github.com/modelcontextprotocol/servers/tree/main/src/github Github MCP server]
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer Puppeteer]
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/google-maps Google Maps MCP Server]
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/slack Slack MCP Server]
**# [https://zapier.com/mcp Zapier MCP Servers] (Slack, Google Sheets, Notion, etc.)
**# [https://github.com/awslabs/mcp AWS MCP Servers]
**# [https://x.com/elevenlabsio/status/1909300782673101265 ElevenLabs]

===Agent2Agent Protocol (A2A)===
* Google [https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ announcement]

===Open-source===
* [https://khoj.dev/ Khoj] ([https://github.com/khoj-ai/khoj code]): self-hostable AI assistant
* [https://github.com/ragapp/ragapp RAGapp]: Agentic RAG for enterprise
* STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
** Can write (e.g.) Wikipedia-style articles
** [https://github.com/stanford-oval/storm/tree/NAACL-2024-code-backup code]
** Preprint: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]

===Personalities/Personas===
* 2023-10: [https://doi.org/10.1145/3586183.3606763 Generative Agents: Interactive Simulacra of Human Behavior]
* 2024-11: Microsoft [https://github.com/microsoft/TinyTroupe TinyTroupe 🤠🤓🥸🧐: LLM-powered multiagent persona simulation for imagination enhancement and business insights]
* 2024-11: [https://arxiv.org/abs/2411.10109 Generative Agent Simulations of 1,000 People] ([https://github.com/joonspk-research/genagents code])

==Specific Uses for AI Assistants==

===Computer Use===
* See: [[Human_Computer_Interaction#AI_Computer_Use]]

===Software Engineering===
* 2024-11: [https://github.com/MLSysOps/MLE-agent MLE-Agent: Your intelligent companion for seamless AI engineering and research]
* [https://github.com/OpenAutoCoder/Agentless Agentless]: agentless approach to automatically solve software development problems

===Science Agents===
See [[Science Agents]].

===Medicine===
* 2025-03: [https://news.microsoft.com/2025/03/03/microsoft-dragon-copilot-provides-the-healthcare-industrys-first-unified-voice-ai-assistant-that-enables-clinicians-to-streamline-clinical-documentation-surface-information-and-automate-task/ Microsoft Dragon Copilot]: streamline clinical workflows and paperwork
* 2025-04: [https://arxiv.org/abs/2504.05186 Training state-of-the-art pathology foundation models with orders of magnitude less data]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-08: [https://arxiv.org/abs/2508.20148 The Anatomy of a Personal Health Agent]

===LLM-as-judge===
* [https://x.com/cwolferesearch/status/1812949923010421192 List of papers].
* [https://www.philschmid.de/llm-evaluation LLM Evaluation doesn't need to be complicated]
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
* [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey]
* [https://github.com/haizelabs/Awesome-LLM-Judges haizelabs Awesome LLM Judges]
* 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
* 2024-12: [https://arxiv.org/abs/2412.05579 LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods]
* 2025-03: [https://arxiv.org/abs/2503.19877 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators]
* 2025-04: [https://arxiv.org/abs/2504.00050 JudgeLRM: Large Reasoning Models as a Judge]
* 2026-01: [https://arxiv.org/abs/2601.05111 Agent-as-a-Judge]

===Deep Research===
* Google [https://blog.google/products/gemini/google-gemini-deep-research/ Deep Research]
* OpenAI [https://openai.com/index/introducing-deep-research/ Deep Research]
* Perplexity:
** [https://www.perplexity.ai/ Search]
** [https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research Deep Research]
* [https://exa.ai/ Exa AI]:
** [https://exa.ai/websets Websets]: Web research agent
** [https://demo.exa.ai/deepseekchat Web-search agent] powered by DeepSeek ([https://github.com/exa-labs/exa-deepseek-chat code]) or [https://o3minichat.exa.ai/ o3-mini] ([https://github.com/exa-labs/exa-o3mini-chat code])
* [https://www.firecrawl.dev/ Firecrawl] [https://x.com/nickscamara_/status/1886287956291338689 wip]
* [https://x.com/mattshumer_ Matt Shumer] [https://github.com/mshumer/OpenDeepResearcher OpenDeepResearcher]
* [https://github.com/zilliztech/deep-searcher DeepSearcher] (operate on local data)
* [https://github.com/nickscamara nickscamara] [https://github.com/nickscamara/open-deep-research open-deep-research]
* [https://x.com/dzhng dzhng] [https://github.com/dzhng/deep-research deep-research]
* [https://huggingface.co/ huggingface] [https://huggingface.co/blog/open-deep-research open-Deep-research ([https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research code])
* xAI Grok 3 Deep Search
* [https://liner.com/news/introducing-deepresearch Liner Deep Research]
* [https://allenai.org/ Allen AI] (AI2) [https://paperfinder.allen.ai/chat Paper Finder]
* 2025-03: [https://arxiv.org/abs/2503.20201 Open Deep Search: Democratizing Search with Open-source Reasoning Agents] ([https://github.com/sentient-agi/OpenDeepSearch code])
* [https://convergence.ai/welcome Convergence AI] Deep Work (swarms for web-based tasks)
* 2025-04: [https://arxiv.org/abs/2504.03160 DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments]
* 2025-04: Anthropic [https://x.com/AnthropicAI/status/1912192384588271771 Research]
* 2025-04: [https://arxiv.org/abs/2504.21776 WebThinker: Empowering Large Reasoning Models with Deep Research Capability]
* 2025-09: [https://arxiv.org/abs/2509.06283 SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents]

=Advanced Workflows=
* [https://salesforce-research-dei-agents.github.io/ Salesforce DEI]: meta-system that leverages a diversity of SWE agents
** Preprint: [https://www.arxiv.org/abs/2408.07060 Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents]
* [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]
** [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
** [https://github.com/SakanaAI/AI-Scientist code]
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
* [https://skywork.ai/home Skywork] [https://skywork.ai/home?inviter=el.cine&shortlink_id=1919604877427924992&utm_source=X Super Agent]

===Streamline Administrative Tasks===
* 2025-02: [https://er.educause.edu/articles/2025/2/ushering-in-a-new-era-of-ai-driven-data-insights-at-uc-san-diego Ushering in a New Era of AI-Driven Data Insights at UC San Diego]

===Author Research Articles===
* 2024-02: STORM: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models] ([https://www.aihero.dev/storm-generate-high-quality-articles-based-on-real-research discussion/analysis])

===Software Development Workflows===
Several paradigms of AI-assisted coding have arisen:
# Manual, human driven
# AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
## OpenAI [https://chatgpt.com/ ChatGPT]
## Anthropic [https://claude.ai/ Claude]
## Google [https://gemini.google.com/app Gemini]
# API calls to an LLM, which generates code and inserts the file into the project
# LLM-integration into the IDE
## [https://github.com/features/copilot Copilot]
## [https://www.qodo.ai/ Qodo] (Codium) & [https://www.qodo.ai/products/alphacodium/ AlphaCodium] ([https://arxiv.org/abs/2401.08500 preprint], [https://github.com/Codium-ai/AlphaCodium code])
## '''[https://www.cursor.com/ Cursor]'''
## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
## ByteDance [https://www.trae.ai/ Trae AI]
## [https://www.tabnine.com/ Tabnine]
## [https://marketplace.visualstudio.com/items?itemName=Traycer.traycer-vscode Traycer]
## [https://idx.dev/ IDX]: free
## [https://github.com/codestoryai/aide Aide]: open-source AI-native code editor (fork of VS Code)
## [https://www.continue.dev/ continue.dev]: open-source code assistant
## [https://trypear.ai/ Pear AI]: open-source code editor
## [https://haystackeditor.com/ Haystack Editor]: canvas UI
## [https://onlook.com/ Onlook]: for designers
## [https://www.all-hands.dev/ All Hands AI]
## [https://app.devin.ai/ Devin 2.0] ([https://cognition.ai/ Cognition AI])
## Google [https://firebase.google.com/docs/studio Firebase Studio]
## [https://github.com/rowboatlabs/rowboat rowboat] (for building multi-agent workflows)
## [https://www.trae.ai/ Trae IDE]: The Real AI Engineer
# AI-assisted IDE, where the AI generates and manages the dev environment
## [https://replit.com/ Replit]
## [https://www.pythagora.ai/ Pythagora]
## [https://stackblitz.com/ StackBlitz] [https://bolt.new/ bolt.new]
## [https://github.com/clinebot/cline Cline] (formerly [https://generativeai.pub/meet-claude-dev-an-open-source-autonomous-ai-programmer-in-vs-code-f457f9821b7b Claude Dev])
## [https://www.all-hands.dev/ All Hands]
# AI Agent on Commandline
## [https://aider.chat/ Aider] ([https://github.com/Aider-AI/aider code]): Pair programming on commandline
## [https://docs.anthropic.com/en/docs/claude-code/overview Claude Code]
## [https://openai.com/codex/ OpenAI Codex]
## [https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/ Gemini CLI]
# Prompt-to-product
## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
## [https://www.create.xyz/ Create.xyz]: text-to-app, replicate product from link
## [https://a0.dev/ a0.dev]: generate mobil apps (from your phone)
## [https://softgen.ai/ Softgen]: web app developer
## [https://wrapifai.com/ wrapifai]: build form-based apps
## [https://lovable.dev/ Lovable]: web app (from text, screenshot, etc.)
## [https://v0.dev/ Vercel v0]
## [https://x.com/johnrushx/status/1625179509728198665 MarsX] ([https://x.com/johnrushx John Rush]): SaaS builder
## [https://webdraw.com/ Webdraw]: turn sketches into web apps
## [https://www.tempo.new/ Tempo Labs]: build React apps
## [https://databutton.com/ Databutton]: no-code software development
## [https://base44.com/ base44]: no-code dashboard apps
## [https://www.theorigin.ai/ Origin AI]
## [https://app.emergent.sh/ Emergent AI]
# Semi-autonomous software engineer agents
## [https://www.cognition.ai/blog/introducing-devin Devin] (Cognition AI)
## [https://aws.amazon.com/q/ Amazon Q] (and CodeWhisperer)
## [https://honeycomb.sh/ Honeycomb]
## [https://www.blackbox.ai/ Agent IDE]
## [https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview Claude Code]
## OpenAI [https://help.openai.com/en/articles/11096431-openai-codex-cli-getting-started Codex CLI] and [https://openai.com/index/introducing-codex/ Codex] cloud
## [https://www.factory.ai/ Factory AI] [https://x.com/FactoryAI/status/1927754706014630357 Droids]
For a review of the current state of software-engineering agentic approaches, see:
* 2024-08: [https://arxiv.org/abs/2408.02479 From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future]
* 2024-09: [https://arxiv.org/abs/2409.02977 Large Language Model-Based Agents for Software Engineering: A Survey]
* 2024-09: [https://arxiv.org/abs/2409.09030 Agents in Software Engineering: Survey, Landscape, and Vision]

=Corporate AI Agent Ventures=
==Mundane Workflows and Capabilities==
* [https://www.paymanai.com/ Payman AI]: AI to Human platform that allows AI to pay people for what it needs
* [https://www.voiceflow.com/ VoiceFlow]: Build customer experiences with AI
* [https://mistral.ai/ Mistral AI]: [https://mistral.ai/news/build-tweak-repeat/ genAI applications]
* [https://www.taskade.com/ Taskade]: Task/milestone software with AI agent workflows
* [https://www.covalent.xyz/ Covalent]: [https://docs.covalent.xyz/docs/cloud/tutorials-cloud/multi_agent/ Building a Multi-Agent Prompt Refining Application]

==Inference-compute Reasoning==
* [https://nousresearch.com/#popup-menu-anchor Nous Research]: [https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/ Forge Reasoning API Beta]

==AI Assistant==
* [https://convergence.ai/ Convergence] [https://proxy.convergence.ai/ Proxy]
* [https://www.shortwave.com/ Shortwave] [https://www.shortwave.com/docs/guides/ai-assistant/ AI Assistant] (organize, write, search, schedule, etc.)
* 2026-02: [https://telepath.computer/ Telepath]

==Agentic Systems==
* [https://topologychat.com/ Topology AI]
* [https://www.cognition.ai/ Cognition AI]: [https://www.cognition.ai/blog/introducing-devin Devin] software engineer (14% SWE-Agent)
* [https://honeycomb.sh/ Honeycomb] ([https://honeycomb.sh/blog/swe-bench-technical-report 22% SWE-Agent])
* [https://www.factory.ai/ Factory AI]
* [https://convergence.ai/welcome Convergence AI] Deep Work (swarms for web-based tasks)
* [https://agents.cloudflare.com/ Cloudflare Agents]
* [https://www.maskara.ai/ Maskara AI]

=Increasing AI Agent Intelligence=
See: [[Increasing AI Intelligence]]

=Multi-agent orchestration=
==Research==
* 2025-02: [https://arxiv.org/abs/2502.02533 Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies]
* 2025-03: [https://arxiv.org/abs/2503.13657 Why Do Multi-Agent LLM Systems Fail?]
* 2025-03: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
* 2025-09: [https://arxiv.org/abs/2509.20175 Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI]
* 2025-11: [https://arxiv.org/abs/2510.26658 The Era of Agentic Organization: Learning to Organize with Language Models] (Microsoft)
* 2025-12: [https://arxiv.org/abs/2512.08296 Towards a Science of Scaling Agent Systems] (Google DeepMind)
** 2026-01: [https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/ Towards a science of scaling agent systems: When and why agent systems work]
* 2026-01: [https://arxiv.org/abs/2601.04748 When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail]
* 2026-02: [https://arxiv.org/abs/2602.11865 Intelligent AI Delegation]
* 2026-03: [https://arxiv.org/abs/2603.01213 Can AI Agents Agree?]

===Organization Schemes===
* 2025-03: [https://arxiv.org/abs/2503.02390 ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks]

===Societies and Communities of AI agents===
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-05: [https://www.science.org/doi/10.1126/sciadv.adu9368 Emergent social conventions and collective bias in LLM populations]
* 2025-09: [https://arxiv.org/abs/2509.10147 Virtual Agent Economies]
* 2026-01: [https://arxiv.org/abs/2601.10825 Reasoning Models Generate Societies of Thought]

===Domain-specific===
* 2024-12: [https://arxiv.org/abs/2412.20138 TradingAgents: Multi-Agents LLM Financial Trading Framework]
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]

==Research demos==
* [https://github.com/camel-ai/camel Camel]
* [https://github.com/farizrahman4u/loopgpt/tree/main LoopGPT]
* [https://github.com/microsoft/JARVIS JARVIS]
* [https://github.com/agiresearch/OpenAGI OpenAGI]
* [https://github.com/microsoft/autogen AutoGen]
** preprint: [https://arxiv.org/abs/2308.08155 AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation]
** [https://github.com/EmergenceAI/Agent-E Agent-E]: Browser (eventually computer) automation ([https://github.com/EmergenceAI/Agent-E code], [https://arxiv.org/abs/2407.13032 preprint], [https://www.youtube.com/watch?v=uyE7tfKkB0E demo video])
** [https://www.microsoft.com/en-us/research/blog/introducing-autogen-studio-a-low-code-interface-for-building-multi-agent-workflows/ AutoGen Studio]: GUI for agent workflows ([https://github.com/microsoft/autogen/tree/main/samples/apps/autogen-studio code])
** [https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/ Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks]
* [https://github.com/ag2ai/ag2 AG2] (previously [https://github.com/microsoft/autogen AutoGen]) ([https://github.com/ag2ai/ag2 code], [https://ag2ai.github.io/ag2/ docs], [https://discord.com/invite/pAbnFJrkgZ Discord])
* [https://github.com/microsoft/TaskWeaver TaskWeaver]
* [https://github.com/geekan/MetaGPT MetaGPT]
* [https://agpt.co/ AutoGPT] ([https://github.com/Significant-Gravitas/AutoGPT code]); and [https://agpt.co/blog/introducing-the-autogpt-platform AutoGPT Platform]
* [https://chenweize1998.github.io/optima-project-page/ Optima]
** preprint: [https://arxiv.org/abs/2410.08115 Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System]
** [https://github.com/thunlp/Optima code]
* 2024-04: [https://arxiv.org/abs/2404.05221 LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models] ([https://github.com/maitrix-org/llm-reasoners code])
* 2024-06: [https://arxiv.org/abs/2406.11638 MASAI: Modular Architecture for Software-engineering AI Agents]
* 2024-10: [https://arxiv.org/abs/2410.08164 Agent S: An Open Agentic Framework that Uses Computers Like a Human] ([https://github.com/simular-ai/Agent-S code])
* 2024-10: [https://arxiv.org/abs/2410.20424 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions]
* 2025-02: [https://arxiv.org/abs/2502.16111 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving]

===Related work===
* 2024-07: [https://arxiv.org/abs/2407.18416 PersonaGym: Evaluating Persona Agents and LLMs]
* 2025-01: [https://arxiv.org/abs/2501.13946 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks]

===Inter-agent communications===
* 2024-10: Agora: [https://agoraprotocol.org/ A Scalable Communication Protocol for Networks of Large Language Models] ([https://arxiv.org/abs/2410.11905 preprint]): disparate agents auto-negotiate communication protocol
* 2024-11: [https://arxiv.org/abs/2411.02820 DroidSpeak: Enhancing Cross-LLM Communication]: Exploits caches of embeddings and key-values, to allow context to be more easily transferred between AIs (without consuming context window)
* 2024-11: Anthropic describes [https://www.anthropic.com/news/model-context-protocol Model Context Protocol]: an open standard for secure, two-way connections between data sources and AI ([https://modelcontextprotocol.io/introduction intro], [https://modelcontextprotocol.io/quickstart quickstart], [https://github.com/modelcontextprotocol code])
* 2025-09: [https://arxiv.org/abs/2509.20175 Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI]

==Architectures==
* [https://arxiv.org/abs/2406.04692 Mixture-of-Agents Enhances Large Language Model Capabilities]
* [https://motleycrew.ai/ Motleycrew.ai] ([https://github.com/ShoggothAI/motleycrew code])

==Open Source Frameworks==
* [https://github.com/langchain-ai/langchain LangChain]
* [https://x.com/wgussml/status/1833615864131948756 ell] ([https://github.com/MadcowD/ell code], [https://docs.ell.so/ docs])
* [https://www.agentops.ai/ AgentOps AI] [https://github.com/AgentOps-AI/AgentStack AgentStack]
* [https://github.com/phidatahq/phidata/tree/main/cookbook/playground Agent UI]
* kyegomez [https://github.com/kyegomez/swarms swarms]
* OpenAI [https://github.com/openai/swarm Swarm] ([https://cookbook.openai.com/examples/orchestrating_agents cookbook])
* Amazon AWS [https://github.com/awslabs/multi-agent-orchestrator Multi-Agent Orchestrator]
* [https://github.com/kaiban-ai/KaibanJS KaibanJS]: Kanban for AI Agents? (Takes inspiration from [https://en.wikipedia.org/wiki/Kanban Kanban] visual [https://www.atlassian.com/agile/kanban work management].)
* [https://github.com/Thytu/Agentarium Agentarium]
* [https://orchestra.org/ Orchestra] ([https://docs.orchestra.org/orchestra/introduction docs], [https://docs.orchestra.org/orchestra/introduction code])
* [https://github.com/HKUDS/AutoAgent AutoAgent]: Fully-Automated & Zero-Code LLM Agent Framework
* [https://mastra.ai/ Mastra] ([https://github.com/mastra-ai/mastra github]): opinionated Typescript framework for AI applications (primitives for workflows, agents, RAG, integrations and evals)
* [https://github.com/orra-dev/orra Orra]: multi-agent applications with complex real-world interactions
* [https://github.com/gensx-inc/gensx/blob/main/README.md GenSX]
* Cloudflare [https://developers.cloudflare.com/agents/ agents-sdk] ([https://blog.cloudflare.com/build-ai-agents-on-cloudflare/ info], [https://github.com/cloudflare/agents code])
* OpenAI [https://platform.openai.com/docs/api-reference/responses responses API] and [https://platform.openai.com/docs/guides/agents agents SDK]
* Google [https://google.github.io/adk-docs/ Agent Development Kit]

==Open Source Systems==
* ControlFlow
** [https://controlflow.ai/welcome documentation]
** [https://github.com/PrefectHQ/ControlFlow code]
* OpenHands (formerly [https://github.com/OpenDevin/OpenDevin OpenDevin])
** [https://github.com/All-Hands-AI/OpenHands code]: platform for autonomous software engineers, powered by AI and LLMs
** Report: [https://arxiv.org/abs/2407.16741 OpenDevin: An Open Platform for AI Software Developers as Generalist Agents]

==Commercial Automation Frameworks==
* [https://lutra.ai/ Lutra]: Automation and integration with various web systems.
* [https://www.gumloop.com/ Gumloop]
* [https://www.textql.com/ TextQL]: Enterprise Virtual Data Analyst
* [https://www.athenaintelligence.ai/ Athena intelligence]: Analytics platform
* [https://gpt.nexus/ Nexus GPT]: Business co-pilot
* [https://www.multion.ai/ Multi-On]: AI agent that acts on your behalf
* [https://www.firecrawl.dev/ Firecrawl]: Turn websites into LLM-ready data
* [https://www.reworkd.ai/ Reworkd]: End-to-end data extraction
* [https://www.lindy.ai/ Lindy]: Custom AI Assistants to automate business workflows
** E.g. [https://x.com/Lindyydrope/status/1821373025125556423 use Slack]
* [https://www.bardeen.ai/ Bardeen]: Automate workflows
* [https://abacus.ai/ Abacus]: [https://abacus.ai/ai_agents AI Agents]
** [https://abacus.ai/help/howTo HowTo]
* [https://www.llamaindex.ai/ LlamaIndex]: ([https://x.com/llama_index 𝕏], [https://github.com/run-llama/llama_index code], [https://docs.llamaindex.ai/en/stable/ docs], [https://discord.com/invite/dGcwcsnxhU Discord])
* [https://www.multion.ai/ MultiOn AI]: [https://www.multion.ai/blog/introducing-agent-q-research-breakthrough-for-the-next-generation-of-ai-agents-with-planning-and-self-healing-capabilities Agent Q] ([https://multion-research.s3.us-east-2.amazonaws.com/AgentQ.pdf paper]) automated planning and execution
* Google [https://cloud.google.com/products/agentspace Agentspace]
* [https://try.flowith.io/ Flowith]

===Multi-agent Handoff/Collaboration===
* [https://www.maskara.ai/ Maskara AI]

===Spreadsheet===
* [https://www.v7labs.com/go V7 Go]
* [https://ottogrid.ai/ Otto Grid]
* [https://www.paradigmai.com/ Paradigm]
* [https://www.superworker.ai/ Superworker AI]
* [https://www.genspark.ai/ Genspark]

==Cloud solutions==
* [https://numbersstation.ai/ Numbers Station] [https://numbersstation.ai/introducing-meadow-llm-agents-for-data-tasks/ Meadow]: agentic framework for data workflows ([https://github.com/NumbersStationAI/meadow code]).
* [https://www.crewai.com/ CrewAI] says they provide multi-agent automations ([https://github.com/joaomdmoura/crewAI code]).
* [https://www.langchain.com/ LangChain] introduced [https://www.langchain.com/langgraph?ref=blog.langchain.dev LangGraph] to help build agents, and [https://blog.langchain.dev/langgraph-cloud/ LangGraph Cloud] as a service for running those agents.
** [https://x.com/LangChainAI/status/1819052975295270949 LangGraph Studio] is an IDE for agent workflows
* [https://c3.ai/ C3 AI] enterprise platform
* [https://www.deepset.ai/ Deepset AI] [https://haystack.deepset.ai/ Haystack] ([https://docs.haystack.deepset.ai/v1.22/docs/agent docs], [https://github.com/deepset-ai/haystack code])

==Frameworks==
* Google [https://go.googlesource.com/oscar/+/refs/heads/master/README.md Project Oscar]
** Agent: Gaby (for "Go AI bot") ([https://go.googlesource.com/oscar/+/refs/heads/master/internal/gaby code], [https://pkg.go.dev/golang.org/x/oscar/internal/gaby documentation]) helps with issue tracking.
* [https://github.com/alexfazio/OpenPlexity-Pages OpenPlexity-Pages]: Data-aggregator implementation (like [https://www.perplexity.ai/ Perplexity]) based on [https://www.crewai.com/ CrewAI]

=Optimization=
===Reviews===
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
* 2025-03: [https://arxiv.org/abs/2503.16416 Survey on Evaluation of LLM-based Agents]

===Metrics, Benchmarks===
See also: [[AI benchmarks]]
* 2019-11: [https://arxiv.org/abs/1911.01547 On the Measure of Intelligence]
* 2022-06: [https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change]
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?] (challenging Corr2Cause task)
* 2024-01: [https://microsoft.github.io/autogen/0.2/blog/2024/01/25/AutoGenBench/ AutoGenBench -- A Tool for Measuring and Evaluating AutoGen Agents]
* 2024-04: AutoRace ([https://github.com/maitrix-org/llm-reasoners code]): [https://arxiv.org/abs/2404.05221 LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models]
* 2024-04: [https://arxiv.org/abs/2404.07972 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments] ([https://os-world.github.io/ github])
* 2024-07: [https://arxiv.org/abs/2407.01502 AI Agents That Matter]
* 2024-09: [https://arxiv.org/abs/2409.11363 CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark] ([https://agent-evals-core-leaderboard.hf.space/ leaderboard])
* 2024-09: [https://arxiv.org/abs/2409.13373 LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench]
* 2024-09: [https://www.arxiv.org/abs/2409.19924 On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability]
* 2024-10: [https://arxiv.org/abs/2410.07095 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering]
* 2024-10: WorFBench: [https://arxiv.org/abs/2410.07869 Benchmarking Agentic Workflow Generation]
* 2024-10: [https://arxiv.org/abs/2410.12851 VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models]
* 2024-10: SimpleAQ: [https://cdn.openai.com/papers/simpleqa.pdf Measuring short-form factuality in large language models] ([https://openai.com/index/introducing-simpleqa/ announcement], [https://github.com/openai/simple-evals code])
* 2024-11: [https://metr.org/AI_R_D_Evaluation_Report.pdf RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts] ([https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/ blog], [https://github.com/METR/ai-rd-tasks/tree/main code])
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
* 2024-11: [https://arxiv.org/abs/2411.13543 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games]
* 2024-12: [https://arxiv.org/abs/2412.14161 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks] ([https://github.com/TheAgentCompany/TheAgentCompany code], [https://the-agent-company.com/ project], [https://the-agent-company.com/#/leaderboard leaderboard])
* 2025-01: [https://codeelo-bench.github.io/ CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings] ([https://arxiv.org/abs/2501.01257 preprint], [https://codeelo-bench.github.io/#leaderboard-table leaderboard])
* 2025-02: [https://static.scale.com/uploads/654197dc94d34f66c0f5184e/EnigmaEval%20v4.pdf ENIGMAEVAL:A Benchmark of Long Multimodal Reasoning Challenges] ([https://scale.com/leaderboard/enigma_eval leaderboard])
* 2025-02: [https://sites.google.com/view/mlgym MLGym: A New Framework and Benchmark for Advancing AI Research Agents] ([https://arxiv.org/abs/2502.14499 paper], [https://github.com/facebookresearch/MLGym code])
* 2025-02: [https://arxiv.org/abs/2502.18356 WebGames: Challenging General-Purpose Web-Browsing AI Agents]
* 2025-03: ColBench: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
* 2025-04 OpenAI [https://openai.com/index/browsecomp/ BrowseComp: a benchmark for browsing agents]
* 2025-04: [https://arxiv.org/abs/2504.11844 Evaluating the Goal-Directedness of Large Language Models]

===Evaluation Schemes===
* 2024-12: [https://arxiv.org/abs/2412.10424 LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation]
* 2025-01: [https://github.com/marquisdepolis/LLMRank LLMRank ("SlopRank")]: LLMs evaluate each other, allowing top model (for a given prompt/problem) to be inferred from a large number of recommendations.

===Multi-agent===
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
* [https://github.com/lechmazur/step_game/ Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure]

===Agent Challenges===
* [https://github.com/aidanmclaughlin/Aidan-Bench Aidan-Bench]: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
** NeurIPS 2024 paper/poster: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions]
* [https://x.com/paul_cal/status/1850262678712856764 Pictionary]: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
* [https://mcbench.ai/ MC-bench]: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges ([https://github.com/mc-bench/orchestrator code]).

===Automated Improvement===
* 2024-06: [https://arxiv.org/abs/2406.14228 EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms]
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents]
* 2024-08: [https://arxiv.org/abs/2408.08435 Automated Design of Agentic Systems] ([https://github.com/ShengranHu/ADAS ADAS code])
* 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]: Iterative self-improvement through generation of synthetic data and evaluation
* 2025-05: [https://arxiv.org/abs/2505.22954 Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents] ([https://github.com/jennyzzt/dgm code], [https://sakana.ai/dgm/ project])
* 2026-03: [https://arxiv.org/abs/2603.19461 Hyperagents]

=See Also=
* [[Science Agents]]
* [[Increasing AI Intelligence]]
* [[AI tools]]
* [[AI understanding]]
* [[Robots]]
* [[Exocortex]]

AI understanding

2026-04-02T17:48:25Z

KevinYager: /* Psychology */

=Interpretability=
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]

==Concepts==
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]

==Mechanistic Interpretability==
* 2020-03: OpenAI: [https://distill.pub/2020/circuits/zoom-in/ Zoom In: An Introduction to Circuits]
* 2021-12: Anthropic: [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* 2022-09: [https://arxiv.org/abs/2211.00593 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-07: Anthropic: [https://transformer-circuits.pub/2024/july-update/index.html Circuits Update]
* 2025-01: [https://arxiv.org/abs/2501.14926 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition] ([https://www.alignmentforum.org/posts/EPefYWjuHNcNH4C7E/attribution-based-parameter-decomposition blog post])
* 2025-01: Review: [https://arxiv.org/abs/2501.16496 Open Problems in Mechanistic Interpretability]
* 2025-03: Anthropic: [https://www.anthropic.com/research/tracing-thoughts-language-model Tracing the thoughts of a large language model]
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
* 2026-01: [https://arxiv.org/abs/2601.13548 Patterning: The Dual of Interpretability]

==Semanticity==
* 2023-09: [https://arxiv.org/abs/2309.08600 Sparse Autoencoders Find Highly Interpretable Features in Language Models]
* Anthropic monosemanticity interpretation of LLM features:
** 2023-10: [https://transformer-circuits.pub/2023/monosemantic-features/index.html Towards Monosemanticity: Decomposing Language Models With Dictionary Learning]
** 2024-05: [https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet]
* 2024-06: OpenaAI: [https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders]
* 2024-08: [https://www.alignmentforum.org/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes Showing SAE Latents Are Not Atomic Using Meta-SAEs] ([https://metasae.streamlit.app/?page=Feature+Explorer&feature=11329 demo])
* 2024-10: [https://arxiv.org/abs/2410.08201 Efficient Dictionary Learning with Switch Sparse Autoencoders] ([https://github.com/amudide/switch_sae code]) More efficient SAE generation
* 2024-10: [https://arxiv.org/abs/2410.14670 Decomposing The Dark Matter of Sparse Autoencoders] ([https://github.com/JoshEngels/SAE-Dark-Matter code]) Shows that SAE errors are predictable
* 2024-10: [https://arxiv.org/abs/2410.13928 Automatically Interpreting Millions of Features in Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.21331 Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness]
* 2024-12: [https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for Transformers]
* 2024-12: [https://www.lesswrong.com/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders Matryoshka Sparse Autoencoders]
* 2024-12: [https://www.alignmentforum.org/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes Learning Multi-Level Features with Matryoshka SAEs]
* 2025-01: [https://arxiv.org/abs/2501.19406 Low-Rank Adapting Models for Sparse Autoencoders]
* 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]
* 2025-03: [https://arxiv.org/abs/2503.00177 Steering Large Language Model Activations in Sparse Spaces]
* 2025-03: [https://arxiv.org/abs/2503.01776 Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation]
* 2025-03: [https://arxiv.org/abs/2503.01824 From superposition to sparse codes: interpretable representations in neural networks]
* 2025-03: [https://arxiv.org/abs/2503.18878 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders]
* 2025-05: [https://arxiv.org/abs/2505.20063 SAEs Are Good for Steering -- If You Select the Right Features]
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]

===Counter-Results===
* 2020-10: [https://arxiv.org/abs/2010.12016 Towards falsifiable interpretability research]
* 2025-01: [https://arxiv.org/abs/2501.16615 Sparse Autoencoders Trained on the Same Data Learn Different Features]
* 2025-01: [https://arxiv.org/abs/2501.17148 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.17727 Sparse Autoencoders Can Interpret Randomly Initialized Transformers]
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]

==Meta-cognition==
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]

==Coding Models==
* '''Sparse Auto Encoders''': See Semanticity.
* [https://github.com/saprmarks/dictionary_learning dictionary_learning]
* [https://transformer-circuits.pub/2024/jan-update/index.html#predict-future Predicting Future Activations]
* 2024-06: [https://arxiv.org/abs/2406.11944 Transcoders Find Interpretable LLM Feature Circuits]
* 2024-10: [https://transformer-circuits.pub/2024/crosscoders/index.html Sparse Crosscoders for Cross-Layer Features and Model Diffing]

==Reward Functions==
* 2024-10: [https://arxiv.org/abs/2410.12491 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL]

==Symbolic and Notation==
* [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* [https://www.arxiv.org/abs/2407.09468 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures]
* 2024-07: [https://arxiv.org/abs/2407.02423 On the Anatomy of Attention]: Introduces category-theoretic diagrammatic formalism for DL architectures
* 2024-11: [https://x.com/vtabbott_/status/1860268276569506250 diagrams to represent algorithms]
* 2024-12: [https://arxiv.org/abs/2412.03317 FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness]

==Mathematical==
* 2024-06: [https://arxiv.org/abs/2406.13762 Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis]

==Geometric==
* 2023-11: [https://arxiv.org/abs/2311.03658 The Linear Representation Hypothesis and the Geometry of Large Language Models]
* 2024-06: [https://arxiv.org/abs/2406.01506 The Geometry of Categorical and Hierarchical Concepts in Large Language Models]
** Natural hierarchies of concepts---which occur throughout natural language and especially in scientific ontologies---are represented in the model's internal vectorial space as polytopes that can be decomposed into simplexes of mutually-exclusive categories.
* 2024-07: [https://arxiv.org/abs/2407.02678 Reasoning in Large Language Models: A Geometric Perspective]
* 2024-09: [https://arxiv.org/abs/2409.17592 Deep Manifold Part 1: Anatomy of Neural Network Manifold]
* 2024-10: [https://arxiv.org/abs/2410.19750 The Geometry of Concepts: Sparse Autoencoder Feature Structure]
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

==Topography==
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]
* 2026-02: [https://arxiv.org/abs/2601.06002 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning]

==Challenges==
* 2023-07Jul: [https://arxiv.org/abs/2307.13702 Measuring Faithfulness in Chain-of-Thought Reasoning] [https://x.com/davidad/status/1839641113432305790 roughly] proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

[[Image:GYe31yXXQAABwaZ.jpeg|300px]]

=Heuristic Understanding=
* 2022-09: Janus: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators]

==Emergent Internal Model Building==
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]

===Semantic Directions===
Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)
* [https://arxiv.org/abs/1301.3781 Efficient Estimation of Word Representations in Vector Space]
* [https://aclanthology.org/N13-1090/ Linguistic Regularities in Continuous Space Word Representations]
* [https://aclanthology.org/C16-1332 Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen]
* [https://aclanthology.org/D14-1162/ Glove: Global vectors for word representation]
* [https://doi.org/10.1109/BigData.2015.7364114 Using Word2Vec to process big text data]
* [https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets] (true/false)
* [https://arxiv.org/abs/2403.10381 Monotonic Representation of Numeric Properties in Language Models] (numeric directions)
Task vectors:
* [https://arxiv.org/abs/2310.15213 Function Vectors in Large Language Models]
* [https://arxiv.org/abs/2310.15916 In-context learning creates task vectors]
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
Reasoning:
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]

===Feature Geometry Reproduces Problem-space===
* [https://arxiv.org/abs/2210.13382 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task] (Othello)
* [https://arxiv.org/abs/2309.00941 Emergent linear representations in world models of self-supervised sequence models] (Othello)
* [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* [https://doi.org/10.1038/s41562-023-01659-w Emergent analogical reasoning in large language models]
* [https://arxiv.org/abs/2310.02207 Language Models Represent Space and Time] (Maps of world, US)
* [https://arxiv.org/abs/2405.14860 Not All Language Model Features Are Linear] (Days of week form ring, etc.)
* [https://arxiv.org/abs/2406.03689 Evaluating the World Model Implicit in a Generative Model] (Map of Manhattan)
* [https://iopscience.iop.org/article/10.1088/1748-9326/ad2891 Reliable precipitation nowcasting using probabilistic diffusion models]. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
* [https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis]: Different models (including across modalities) are converging to a consistent world model.
* [https://arxiv.org/abs/2501.00070 ICLR: In-Context Learning of Representations]
* [https://arxiv.org/abs/2502.00873 Language Models Use Trigonometry to Do Addition]: Numbers arranged in helix to enable addition
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]

===Capturing Physics===
* 2020-09: [https://arxiv.org/abs/2009.08292 Learning to Identify Physical Parameters from Video Using Differentiable Physics]
* 2022-07: [https://arxiv.org/abs/2207.00419 Self-Supervised Learning for Videos: A Survey]
* 2025-02: Fair at Meta: [https://arxiv.org/abs/2502.11831 Intuitive physics understanding emerges from self-supervised pretraining on natural videos]

===Theory of Mind===
* [https://arxiv.org/abs/2302.02083 Evaluating Large Language Models in Theory of Mind Tasks]
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

===Skeptical===
* 2025-01: [https://www.arxiv.org/abs/2501.09038 Do generative video models learn physical principles from watching videos?] ([https://physics-iq.github.io/ project], [https://github.com/google-deepmind/physics-IQ-benchmark code])
* 2025-06: [https://machinelearning.apple.com/research/illusion-of-thinking The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-06: [https://arxiv.org/abs/2506.21521 Potemkin Understanding in Large Language Models]
* 2025-06: [https://arxiv.org/abs/2506.21876 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation]

==Information Processing==
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
* 2024-07: [https://arxiv.org/abs/2407.20311 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process]
** Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
** When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]

===Generalization===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]

===Grokking===
* 2022-01: [https://arxiv.org/abs/2201.02177 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]
* 2022-05: [https://arxiv.org/abs/2205.10343 Towards Understanding Grokking: An Effective Theory of Representation Learning]
* 2024-01: [https://arxiv.org/abs/2401.10463 Critical Data Size of Language Models from a Grokking Perspective]
* 2024-02: [https://arxiv.org/abs/2402.15175 Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition]
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

===Tests of Resilience to Dropouts/etc.===
* 2024-02: [https://arxiv.org/abs/2402.15390 Explorations of Self-Repair in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.15786 What Matters in Transformers? Not All Attention is Needed]
** Removing entire transformer blocks leads to significant performance degradation
** Removing MLP layers results in significant performance degradation
** Removing attention layers causes almost no performance degradation
** E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
* 2024-06: [https://arxiv.org/abs/2406.19384 The Remarkable Robustness of LLMs: Stages of Inference?]
** They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
** They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
*** '''Detokenization:''' Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
*** '''Feature engineering:''' Features are progressively refined. Factual knowledge is leveraged.
*** '''Prediction ensembling:''' Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
*** '''Residual sharpening:''' The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
** This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

==Semantic Vectors==
* 2024-06: [https://arxiv.org/abs/2406.11717 Refusal in Language Models Is Mediated by a Single Direction]
* 2025-02: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs] ([https://x.com/OwainEvans_UK/status/1894436637054214509 demonstrates] [https://x.com/ESYudkowsky/status/1894453376215388644 entangling] of concepts into a single preference vector)
* 2025-03: [https://arxiv.org/abs/2503.03666 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction]

==Other==
* 2024-11: [https://arxiv.org/abs/2411.00247 Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond]
* 2024-11: [https://arxiv.org/abs/2411.04282 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding] ([https://github.com/SalesforceAIResearch/LaTRO code])
* 2024-11: [https://arxiv.org/abs/2411.12580 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models]: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
* 2024-11: [https://arxiv.org/abs/2411.15862 LLMs Do Not Think Step-by-step In Implicit Reasoning]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]

===Scaling Laws===
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
* 2020-01: [https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models] (OpenAI)
* 2020-10: [https://arxiv.org/abs/2010.14701 Scaling Laws for Autoregressive Generative Modeling] (OpenAI)
* 2020-05: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] (Gwern)
* 2021-08: [https://arxiv.org/abs/2108.07686 Scaling Laws for Deep Learning]
* 2021-02: [https://arxiv.org/abs/2102.06701 Explaining Neural Scaling Laws] (Google DeepMind)
* 2022-03: [https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models] (Chinchilla, Google DeepMind)
* 2025-03: [https://arxiv.org/abs/2503.04715 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining]
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]

=Information Processing/Storage=
* 2020-02: [https://arxiv.org/abs/2002.10689 A Theory of Usable Information Under Computational Constraints]
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
** 2024-04: [https://arxiv.org/abs/2404.08819 The Illusion of State in State-Space Models] (figure 3)
** 2024-08: [https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size] (table 9)
* 2024-09: [https://arxiv.org/abs/2409.10482 Schrodinger's Memory: Large Language Models]
* 2024-10: [https://arxiv.org/abs/2407.01687 Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning]. CoT involves both memorization and (probabilitic) reasoning
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
* 2025-12: [https://arxiv.org/abs/2512.22471 The Bayesian Geometry of Transformer Attention]
* 2026-01: [https://arxiv.org/abs/2601.03220 From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence]

==Statistics/Math==
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]

==Tokenization==
===For numbers/math===
* 2024-02: [https://arxiv.org/abs/2402.14903 Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs]: L2R vs. R2L yields different performance on math

==Data Storage==
* 1988-09: [https://www.sciencedirect.com/science/article/pii/0885064X88900209 On the capabilities of multilayer perceptrons]
* 2006-12: [https://ieeexplore.ieee.org/document/4038449 Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition] (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N2 bits w/ N2 params)
* 2016-11: [https://arxiv.org/abs/1611.09913 Capacity and Trainability in Recurrent Neural Networks] (5 bits/param)
* 2018-02: [https://arxiv.org/abs/1802.08232 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks]
* 2019-05: [https://ieeexplore.ieee.org/document/8682462 Memorization Capacity of Deep Neural Networks under Parameter Quantization]
* 2020-02: [https://arxiv.org/abs/2002.08910 How Much Knowledge Can You Pack Into the Parameters of a Language Model?]
* 2020-08: [https://arxiv.org/abs/2008.09036 Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries] (capacity scales linearly with parameters; more training samples leads to less memorization)
* 2020-12: [https://arxiv.org/abs/2012.06421 When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?]
* 2024-04: [https://arxiv.org/abs/2404.05405 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws] (2 bits/param)
* 2024-06: [https://arxiv.org/abs/2406.15720 Scaling Laws for Fact Memorization of Large Language Models] (1T params needed to memorize Wikipedia)
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-05: [https://arxiv.org/abs/2505.24832 How much do language models memorize?] (3.6 bits/parameter)
* 2025-06: [https://arxiv.org/abs/2506.01855 Trade-offs in Data Memorization via Strong Data Processing Inequalities]

===Reverse-Engineering Training Data===
* 2025-06: [https://arxiv.org/abs/2506.10364 Can We Infer Confidential Properties of Training Data from LLMs?]
* 2025-06: [https://arxiv.org/abs/2506.15553 Approximating Language Model Training Data from Weights]

===Compression===
* 2022-12: [https://arxiv.org/abs/2212.09410 Less is More: Parameter-Free Text Classification with Gzip]
* 2023-06: [https://arxiv.org/abs/2306.04050 LLMZip: Lossless Text Compression using Large Language Models]
* 2023-07: [https://aclanthology.org/2023.findings-acl.426/ “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors]
* 2023-09: [https://arxiv.org/abs/2309.10668 Language Modeling Is Compression]
* 2024-06: [https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation]

==Learning/Training==
* 2018-03: [https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks]: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
* 2024-12: [https://arxiv.org/abs/2412.11521 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory]
* 2025-01: [https://arxiv.org/abs/2501.12391 Physics of Skill Learning]
* 2025-05: [https://arxiv.org/abs/2505.24864 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models]

===Cross-modal knowledge transfer===
* 2022-03: [https://arxiv.org/abs/2203.07519 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer]
* 2023-05: [https://arxiv.org/abs/2305.07358 Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

==Hidden State==
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
===Convergent Representation===
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
* 2025-12: [https://arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]

==Function Approximation==
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]: can learn linear functions (equivalent to least-squares estimator)
* 2022-11: [https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning]: Simple arithmetic
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models] ([https://github.com/ekinakyurek/google-research/tree/master/incontext code]): can learn linear regression
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2023-06: [https://arxiv.org/abs/2306.00297 Transformers learn to implement preconditioned gradient descent for in-context learning]
* 2023-07: [https://arxiv.org/abs/2307.03576 One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention]
* 2024-04: [https://arxiv.org/abs/2404.02893 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline]
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]

=Physics Based=
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]

=Failure Modes=
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?]: Poor causal inference
* 2023-09: [https://arxiv.org/abs/2309.12288 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"]
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)

==Adversarial==
* 2026-03: [https://arxiv.org/abs/2603.03507 Solving adversarial examples requires solving exponential misalignment]

==Fracture Representation==
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])

==Jagged Frontier==
* 2023-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality]
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]

===See also===
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]

===Conversely (AI models converge)===
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
* 2025-12: [https://arxiv.org/abs/2512.05117 The Universal Weight Subspace Hypothesis]
* 2026-01: [https://avikrishna.substack.com/p/eliciting-frontier-model-character Eliciting Frontier Model Character Training: A study of personality convergence across language models]

==Model Collapse==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
* 2023-11: [https://arxiv.org/abs/2311.12202 Nepotistically Trained Generative-AI Models Collapse]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2026-01: [https://arxiv.org/abs/2601.05280 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis]

===Analysis===
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]

===Mitigation===
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]
* 2025-03: [https://arxiv.org/abs/2503.08117 Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models]

=Psychology=
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
* 2026-01: [https://arxiv.org/abs/2601.06047 "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs]
* 2026-02: [https://arxiv.org/abs/2602.02606 Gender Dynamics and Homophily in a Social Network of LLM Agents]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]
* 2026-04: [https://transformer-circuits.pub/2026/emotions/index.html Emotion concepts and their function in a large language model] ([https://www.anthropic.com/research/emotion-concepts-function blog])

==Persona Simulator Theory==
* 2022-09: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators] ([https://www.lesswrong.com/users/janus-1?from=post_header janus])
* 2022-12: [https://aclanthology.org/2022.findings-emnlp.423/ Language Models as Agent Models]
* 2023-02: [https://arxiv.org/abs/2302.00805 Conditioning Predictive Models: Risks and Strategies]
* 2024-09: [https://www.lesswrong.com/s/qhdHbCJ3PYesL9dde Intuitive Self-Models]
* 2026-02: [https://alignment.anthropic.com/2026/psm/ The Persona Selection Model: Why AI Assistants might Behave like Humans] (Anthropic, [https://www.anthropic.com/research/persona-selection-model blog])

==Allow LLM to think==
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]

===In-context Learning===
* 2021-10: [https://arxiv.org/abs/2110.15943 MetaICL: Learning to Learn In Context]
* 2022-02: [https://arxiv.org/abs/2202.12837 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]

==Reasoning (CoT, etc.)==
* 2025-01: [https://arxiv.org/abs/2501.18009 Large Language Models Think Too Fast To Explore Effectively]
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]

===Pathfinding===
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]

===Skeptical===
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]

==Self-Awareness and Self-Recognition and Introspection==
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
* 2025-12: [https://www.arxiv.org/abs/2512.24661 Do Large Language Models Know What They Are Capable Of?]

==LLM personalities==
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
* 2026-01: [https://arxiv.org/abs/2601.10387 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models]

==Quirks & Biases==
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]

=Vision Models=
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])

=See Also=
* [[AI]]
* [[AI tools]]
* [[AI agents]]
* [[Robots]]

AI understanding

2026-04-02T17:45:07Z

KevinYager: /* Psychology */

=Interpretability=
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]

==Concepts==
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]

==Mechanistic Interpretability==
* 2020-03: OpenAI: [https://distill.pub/2020/circuits/zoom-in/ Zoom In: An Introduction to Circuits]
* 2021-12: Anthropic: [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* 2022-09: [https://arxiv.org/abs/2211.00593 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-07: Anthropic: [https://transformer-circuits.pub/2024/july-update/index.html Circuits Update]
* 2025-01: [https://arxiv.org/abs/2501.14926 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition] ([https://www.alignmentforum.org/posts/EPefYWjuHNcNH4C7E/attribution-based-parameter-decomposition blog post])
* 2025-01: Review: [https://arxiv.org/abs/2501.16496 Open Problems in Mechanistic Interpretability]
* 2025-03: Anthropic: [https://www.anthropic.com/research/tracing-thoughts-language-model Tracing the thoughts of a large language model]
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
* 2026-01: [https://arxiv.org/abs/2601.13548 Patterning: The Dual of Interpretability]

==Semanticity==
* 2023-09: [https://arxiv.org/abs/2309.08600 Sparse Autoencoders Find Highly Interpretable Features in Language Models]
* Anthropic monosemanticity interpretation of LLM features:
** 2023-10: [https://transformer-circuits.pub/2023/monosemantic-features/index.html Towards Monosemanticity: Decomposing Language Models With Dictionary Learning]
** 2024-05: [https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet]
* 2024-06: OpenaAI: [https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders]
* 2024-08: [https://www.alignmentforum.org/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes Showing SAE Latents Are Not Atomic Using Meta-SAEs] ([https://metasae.streamlit.app/?page=Feature+Explorer&feature=11329 demo])
* 2024-10: [https://arxiv.org/abs/2410.08201 Efficient Dictionary Learning with Switch Sparse Autoencoders] ([https://github.com/amudide/switch_sae code]) More efficient SAE generation
* 2024-10: [https://arxiv.org/abs/2410.14670 Decomposing The Dark Matter of Sparse Autoencoders] ([https://github.com/JoshEngels/SAE-Dark-Matter code]) Shows that SAE errors are predictable
* 2024-10: [https://arxiv.org/abs/2410.13928 Automatically Interpreting Millions of Features in Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.21331 Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness]
* 2024-12: [https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for Transformers]
* 2024-12: [https://www.lesswrong.com/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders Matryoshka Sparse Autoencoders]
* 2024-12: [https://www.alignmentforum.org/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes Learning Multi-Level Features with Matryoshka SAEs]
* 2025-01: [https://arxiv.org/abs/2501.19406 Low-Rank Adapting Models for Sparse Autoencoders]
* 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]
* 2025-03: [https://arxiv.org/abs/2503.00177 Steering Large Language Model Activations in Sparse Spaces]
* 2025-03: [https://arxiv.org/abs/2503.01776 Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation]
* 2025-03: [https://arxiv.org/abs/2503.01824 From superposition to sparse codes: interpretable representations in neural networks]
* 2025-03: [https://arxiv.org/abs/2503.18878 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders]
* 2025-05: [https://arxiv.org/abs/2505.20063 SAEs Are Good for Steering -- If You Select the Right Features]
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]

===Counter-Results===
* 2020-10: [https://arxiv.org/abs/2010.12016 Towards falsifiable interpretability research]
* 2025-01: [https://arxiv.org/abs/2501.16615 Sparse Autoencoders Trained on the Same Data Learn Different Features]
* 2025-01: [https://arxiv.org/abs/2501.17148 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.17727 Sparse Autoencoders Can Interpret Randomly Initialized Transformers]
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]

==Meta-cognition==
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]

==Coding Models==
* '''Sparse Auto Encoders''': See Semanticity.
* [https://github.com/saprmarks/dictionary_learning dictionary_learning]
* [https://transformer-circuits.pub/2024/jan-update/index.html#predict-future Predicting Future Activations]
* 2024-06: [https://arxiv.org/abs/2406.11944 Transcoders Find Interpretable LLM Feature Circuits]
* 2024-10: [https://transformer-circuits.pub/2024/crosscoders/index.html Sparse Crosscoders for Cross-Layer Features and Model Diffing]

==Reward Functions==
* 2024-10: [https://arxiv.org/abs/2410.12491 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL]

==Symbolic and Notation==
* [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* [https://www.arxiv.org/abs/2407.09468 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures]
* 2024-07: [https://arxiv.org/abs/2407.02423 On the Anatomy of Attention]: Introduces category-theoretic diagrammatic formalism for DL architectures
* 2024-11: [https://x.com/vtabbott_/status/1860268276569506250 diagrams to represent algorithms]
* 2024-12: [https://arxiv.org/abs/2412.03317 FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness]

==Mathematical==
* 2024-06: [https://arxiv.org/abs/2406.13762 Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis]

==Geometric==
* 2023-11: [https://arxiv.org/abs/2311.03658 The Linear Representation Hypothesis and the Geometry of Large Language Models]
* 2024-06: [https://arxiv.org/abs/2406.01506 The Geometry of Categorical and Hierarchical Concepts in Large Language Models]
** Natural hierarchies of concepts---which occur throughout natural language and especially in scientific ontologies---are represented in the model's internal vectorial space as polytopes that can be decomposed into simplexes of mutually-exclusive categories.
* 2024-07: [https://arxiv.org/abs/2407.02678 Reasoning in Large Language Models: A Geometric Perspective]
* 2024-09: [https://arxiv.org/abs/2409.17592 Deep Manifold Part 1: Anatomy of Neural Network Manifold]
* 2024-10: [https://arxiv.org/abs/2410.19750 The Geometry of Concepts: Sparse Autoencoder Feature Structure]
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

==Topography==
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]
* 2026-02: [https://arxiv.org/abs/2601.06002 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning]

==Challenges==
* 2023-07Jul: [https://arxiv.org/abs/2307.13702 Measuring Faithfulness in Chain-of-Thought Reasoning] [https://x.com/davidad/status/1839641113432305790 roughly] proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

[[Image:GYe31yXXQAABwaZ.jpeg|300px]]

=Heuristic Understanding=
* 2022-09: Janus: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators]

==Emergent Internal Model Building==
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]

===Semantic Directions===
Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)
* [https://arxiv.org/abs/1301.3781 Efficient Estimation of Word Representations in Vector Space]
* [https://aclanthology.org/N13-1090/ Linguistic Regularities in Continuous Space Word Representations]
* [https://aclanthology.org/C16-1332 Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen]
* [https://aclanthology.org/D14-1162/ Glove: Global vectors for word representation]
* [https://doi.org/10.1109/BigData.2015.7364114 Using Word2Vec to process big text data]
* [https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets] (true/false)
* [https://arxiv.org/abs/2403.10381 Monotonic Representation of Numeric Properties in Language Models] (numeric directions)
Task vectors:
* [https://arxiv.org/abs/2310.15213 Function Vectors in Large Language Models]
* [https://arxiv.org/abs/2310.15916 In-context learning creates task vectors]
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
Reasoning:
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]

===Feature Geometry Reproduces Problem-space===
* [https://arxiv.org/abs/2210.13382 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task] (Othello)
* [https://arxiv.org/abs/2309.00941 Emergent linear representations in world models of self-supervised sequence models] (Othello)
* [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* [https://doi.org/10.1038/s41562-023-01659-w Emergent analogical reasoning in large language models]
* [https://arxiv.org/abs/2310.02207 Language Models Represent Space and Time] (Maps of world, US)
* [https://arxiv.org/abs/2405.14860 Not All Language Model Features Are Linear] (Days of week form ring, etc.)
* [https://arxiv.org/abs/2406.03689 Evaluating the World Model Implicit in a Generative Model] (Map of Manhattan)
* [https://iopscience.iop.org/article/10.1088/1748-9326/ad2891 Reliable precipitation nowcasting using probabilistic diffusion models]. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
* [https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis]: Different models (including across modalities) are converging to a consistent world model.
* [https://arxiv.org/abs/2501.00070 ICLR: In-Context Learning of Representations]
* [https://arxiv.org/abs/2502.00873 Language Models Use Trigonometry to Do Addition]: Numbers arranged in helix to enable addition
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]

===Capturing Physics===
* 2020-09: [https://arxiv.org/abs/2009.08292 Learning to Identify Physical Parameters from Video Using Differentiable Physics]
* 2022-07: [https://arxiv.org/abs/2207.00419 Self-Supervised Learning for Videos: A Survey]
* 2025-02: Fair at Meta: [https://arxiv.org/abs/2502.11831 Intuitive physics understanding emerges from self-supervised pretraining on natural videos]

===Theory of Mind===
* [https://arxiv.org/abs/2302.02083 Evaluating Large Language Models in Theory of Mind Tasks]
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

===Skeptical===
* 2025-01: [https://www.arxiv.org/abs/2501.09038 Do generative video models learn physical principles from watching videos?] ([https://physics-iq.github.io/ project], [https://github.com/google-deepmind/physics-IQ-benchmark code])
* 2025-06: [https://machinelearning.apple.com/research/illusion-of-thinking The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-06: [https://arxiv.org/abs/2506.21521 Potemkin Understanding in Large Language Models]
* 2025-06: [https://arxiv.org/abs/2506.21876 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation]

==Information Processing==
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
* 2024-07: [https://arxiv.org/abs/2407.20311 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process]
** Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
** When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]

===Generalization===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]

===Grokking===
* 2022-01: [https://arxiv.org/abs/2201.02177 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]
* 2022-05: [https://arxiv.org/abs/2205.10343 Towards Understanding Grokking: An Effective Theory of Representation Learning]
* 2024-01: [https://arxiv.org/abs/2401.10463 Critical Data Size of Language Models from a Grokking Perspective]
* 2024-02: [https://arxiv.org/abs/2402.15175 Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition]
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

===Tests of Resilience to Dropouts/etc.===
* 2024-02: [https://arxiv.org/abs/2402.15390 Explorations of Self-Repair in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.15786 What Matters in Transformers? Not All Attention is Needed]
** Removing entire transformer blocks leads to significant performance degradation
** Removing MLP layers results in significant performance degradation
** Removing attention layers causes almost no performance degradation
** E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
* 2024-06: [https://arxiv.org/abs/2406.19384 The Remarkable Robustness of LLMs: Stages of Inference?]
** They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
** They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
*** '''Detokenization:''' Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
*** '''Feature engineering:''' Features are progressively refined. Factual knowledge is leveraged.
*** '''Prediction ensembling:''' Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
*** '''Residual sharpening:''' The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
** This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

==Semantic Vectors==
* 2024-06: [https://arxiv.org/abs/2406.11717 Refusal in Language Models Is Mediated by a Single Direction]
* 2025-02: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs] ([https://x.com/OwainEvans_UK/status/1894436637054214509 demonstrates] [https://x.com/ESYudkowsky/status/1894453376215388644 entangling] of concepts into a single preference vector)
* 2025-03: [https://arxiv.org/abs/2503.03666 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction]

==Other==
* 2024-11: [https://arxiv.org/abs/2411.00247 Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond]
* 2024-11: [https://arxiv.org/abs/2411.04282 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding] ([https://github.com/SalesforceAIResearch/LaTRO code])
* 2024-11: [https://arxiv.org/abs/2411.12580 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models]: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
* 2024-11: [https://arxiv.org/abs/2411.15862 LLMs Do Not Think Step-by-step In Implicit Reasoning]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]

===Scaling Laws===
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
* 2020-01: [https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models] (OpenAI)
* 2020-10: [https://arxiv.org/abs/2010.14701 Scaling Laws for Autoregressive Generative Modeling] (OpenAI)
* 2020-05: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] (Gwern)
* 2021-08: [https://arxiv.org/abs/2108.07686 Scaling Laws for Deep Learning]
* 2021-02: [https://arxiv.org/abs/2102.06701 Explaining Neural Scaling Laws] (Google DeepMind)
* 2022-03: [https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models] (Chinchilla, Google DeepMind)
* 2025-03: [https://arxiv.org/abs/2503.04715 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining]
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]

=Information Processing/Storage=
* 2020-02: [https://arxiv.org/abs/2002.10689 A Theory of Usable Information Under Computational Constraints]
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
** 2024-04: [https://arxiv.org/abs/2404.08819 The Illusion of State in State-Space Models] (figure 3)
** 2024-08: [https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size] (table 9)
* 2024-09: [https://arxiv.org/abs/2409.10482 Schrodinger's Memory: Large Language Models]
* 2024-10: [https://arxiv.org/abs/2407.01687 Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning]. CoT involves both memorization and (probabilitic) reasoning
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
* 2025-12: [https://arxiv.org/abs/2512.22471 The Bayesian Geometry of Transformer Attention]
* 2026-01: [https://arxiv.org/abs/2601.03220 From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence]

==Statistics/Math==
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]

==Tokenization==
===For numbers/math===
* 2024-02: [https://arxiv.org/abs/2402.14903 Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs]: L2R vs. R2L yields different performance on math

==Data Storage==
* 1988-09: [https://www.sciencedirect.com/science/article/pii/0885064X88900209 On the capabilities of multilayer perceptrons]
* 2006-12: [https://ieeexplore.ieee.org/document/4038449 Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition] (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N2 bits w/ N2 params)
* 2016-11: [https://arxiv.org/abs/1611.09913 Capacity and Trainability in Recurrent Neural Networks] (5 bits/param)
* 2018-02: [https://arxiv.org/abs/1802.08232 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks]
* 2019-05: [https://ieeexplore.ieee.org/document/8682462 Memorization Capacity of Deep Neural Networks under Parameter Quantization]
* 2020-02: [https://arxiv.org/abs/2002.08910 How Much Knowledge Can You Pack Into the Parameters of a Language Model?]
* 2020-08: [https://arxiv.org/abs/2008.09036 Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries] (capacity scales linearly with parameters; more training samples leads to less memorization)
* 2020-12: [https://arxiv.org/abs/2012.06421 When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?]
* 2024-04: [https://arxiv.org/abs/2404.05405 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws] (2 bits/param)
* 2024-06: [https://arxiv.org/abs/2406.15720 Scaling Laws for Fact Memorization of Large Language Models] (1T params needed to memorize Wikipedia)
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-05: [https://arxiv.org/abs/2505.24832 How much do language models memorize?] (3.6 bits/parameter)
* 2025-06: [https://arxiv.org/abs/2506.01855 Trade-offs in Data Memorization via Strong Data Processing Inequalities]

===Reverse-Engineering Training Data===
* 2025-06: [https://arxiv.org/abs/2506.10364 Can We Infer Confidential Properties of Training Data from LLMs?]
* 2025-06: [https://arxiv.org/abs/2506.15553 Approximating Language Model Training Data from Weights]

===Compression===
* 2022-12: [https://arxiv.org/abs/2212.09410 Less is More: Parameter-Free Text Classification with Gzip]
* 2023-06: [https://arxiv.org/abs/2306.04050 LLMZip: Lossless Text Compression using Large Language Models]
* 2023-07: [https://aclanthology.org/2023.findings-acl.426/ “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors]
* 2023-09: [https://arxiv.org/abs/2309.10668 Language Modeling Is Compression]
* 2024-06: [https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation]

==Learning/Training==
* 2018-03: [https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks]: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
* 2024-12: [https://arxiv.org/abs/2412.11521 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory]
* 2025-01: [https://arxiv.org/abs/2501.12391 Physics of Skill Learning]
* 2025-05: [https://arxiv.org/abs/2505.24864 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models]

===Cross-modal knowledge transfer===
* 2022-03: [https://arxiv.org/abs/2203.07519 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer]
* 2023-05: [https://arxiv.org/abs/2305.07358 Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

==Hidden State==
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
===Convergent Representation===
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
* 2025-12: [https://arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]

==Function Approximation==
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]: can learn linear functions (equivalent to least-squares estimator)
* 2022-11: [https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning]: Simple arithmetic
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models] ([https://github.com/ekinakyurek/google-research/tree/master/incontext code]): can learn linear regression
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2023-06: [https://arxiv.org/abs/2306.00297 Transformers learn to implement preconditioned gradient descent for in-context learning]
* 2023-07: [https://arxiv.org/abs/2307.03576 One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention]
* 2024-04: [https://arxiv.org/abs/2404.02893 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline]
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]

=Physics Based=
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]

=Failure Modes=
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?]: Poor causal inference
* 2023-09: [https://arxiv.org/abs/2309.12288 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"]
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)

==Adversarial==
* 2026-03: [https://arxiv.org/abs/2603.03507 Solving adversarial examples requires solving exponential misalignment]

==Fracture Representation==
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])

==Jagged Frontier==
* 2023-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality]
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]

===See also===
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]

===Conversely (AI models converge)===
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
* 2025-12: [https://arxiv.org/abs/2512.05117 The Universal Weight Subspace Hypothesis]
* 2026-01: [https://avikrishna.substack.com/p/eliciting-frontier-model-character Eliciting Frontier Model Character Training: A study of personality convergence across language models]

==Model Collapse==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
* 2023-11: [https://arxiv.org/abs/2311.12202 Nepotistically Trained Generative-AI Models Collapse]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2026-01: [https://arxiv.org/abs/2601.05280 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis]

===Analysis===
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]

===Mitigation===
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]
* 2025-03: [https://arxiv.org/abs/2503.08117 Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models]

=Psychology=
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
* 2026-01: [https://arxiv.org/abs/2601.06047 "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs]
* 2026-02: [https://arxiv.org/abs/2602.02606 Gender Dynamics and Homophily in a Social Network of LLM Agents]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]
* 2026-04: [https://www.anthropic.com/research/emotion-concepts-function Emotion concepts and their function in a large language model]

==Persona Simulator Theory==
* 2022-09: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators] ([https://www.lesswrong.com/users/janus-1?from=post_header janus])
* 2022-12: [https://aclanthology.org/2022.findings-emnlp.423/ Language Models as Agent Models]
* 2023-02: [https://arxiv.org/abs/2302.00805 Conditioning Predictive Models: Risks and Strategies]
* 2024-09: [https://www.lesswrong.com/s/qhdHbCJ3PYesL9dde Intuitive Self-Models]
* 2026-02: [https://alignment.anthropic.com/2026/psm/ The Persona Selection Model: Why AI Assistants might Behave like Humans] (Anthropic, [https://www.anthropic.com/research/persona-selection-model blog])

==Allow LLM to think==
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]

===In-context Learning===
* 2021-10: [https://arxiv.org/abs/2110.15943 MetaICL: Learning to Learn In Context]
* 2022-02: [https://arxiv.org/abs/2202.12837 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]

==Reasoning (CoT, etc.)==
* 2025-01: [https://arxiv.org/abs/2501.18009 Large Language Models Think Too Fast To Explore Effectively]
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]

===Pathfinding===
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]

===Skeptical===
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]

==Self-Awareness and Self-Recognition and Introspection==
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
* 2025-12: [https://www.arxiv.org/abs/2512.24661 Do Large Language Models Know What They Are Capable Of?]

==LLM personalities==
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
* 2026-01: [https://arxiv.org/abs/2601.10387 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models]

==Quirks & Biases==
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]

=Vision Models=
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])

=See Also=
* [[AI]]
* [[AI tools]]
* [[AI agents]]
* [[Robots]]

AI and Humans

2026-04-02T15:31:07Z

KevinYager: /* AI Persuasion of Humans */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]
* 2026-03: Anthropic: [https://www.anthropic.com/features/81k-interviews What 81,000 people want from AI]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://www.nature.com/articles/s41562-025-02194-6 On the conversational persuasiveness of GPT-4]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
** 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2026-03: [https://arxiv.org/abs/2603.15245 Practicing with Language Models Cultivates Human Empathic Communication]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

Science Agents

2026-04-02T13:40:40Z

KevinYager: /* (Pre) Generate Articles */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models]

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
** 2026-04: [https://www.nature.com/articles/s41586-026-10265-5 Towards end-to-end automation of AI research]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Personalities==
* 2026-03: [https://github.com/msitarzewski/agency-agents The Agency: AI Specialists Ready to Transform Your Workflow]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
** 2026-03: Three problems solved using OpenAI GPT internal model. Paper: [https://arxiv.org/pdf/2603.29961 Short Proofs in Combinatorics and Number Theory]
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

Science Agents

2026-04-01T16:32:19Z

KevinYager: /* Math */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models]

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Personalities==
* 2026-03: [https://github.com/msitarzewski/agency-agents The Agency: AI Specialists Ready to Transform Your Workflow]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
** 2026-03: Three problems solved using OpenAI GPT internal model. Paper: [https://arxiv.org/pdf/2603.29961 Short Proofs in Combinatorics and Number Theory]
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI

2026-03-31T19:24:32Z

KevinYager: /* News */

This page pulls together a set of resources focused on '''Artificial Intelligence''' (AI) and '''machine-learning''' (ML), biased towards modern/frontier generative AI (LLMs, etc.).

==Fundamentals==
* [[AI tutorials]]
* [[AI understanding]]: Papers that expose how LLMs "think"
* [[AI tools]]: List of modern models (LLM, ASR, etc.) and related tools (RAG, etc.)
** [[Data Extraction]]
** [[AI compute]]
*** [[AI_compute#Energy_Use|Energy use]]

==Improvements==
* [[AI tricks]]
* [[AI research trends]]
* [[Increasing AI Intelligence]] (especially system 2 / deliberative reasoning / inference-time compute)
* [[AI benchmarks]]

==Agents==
* [[AI Agents]]
** [[Science Agents]]
* [[Exocortex]]
* [http://yager-research.ca/2024/11/what-is-an-ai-agent/ Definition of AI agent]
[[Image:AI definitions10.png|500px]]

==Uses of AI==
* [[AI video]]: Progress of generative video
* In science, see: [[Science Agents]]
** [[Science_Agents#Genuine_Discoveries|Genuine Discoveries]]
* [[AI creativity]]
* [[AI and Humans]]
** [[AI_and_Humans#AI_in_Education|AI in education]]
** [[AI_and_Humans#Simulate_Humans|Simulate Humans]]

==Related==
* [[AI predictions]]
* [[AI safety]]
* [[Robots]]
* [[Human Computer Interaction]] (HCI)
** [[Human_Computer_Interaction#Smart_Wearables|AI devices]] (Smart Glasses, etc.)
* [[Human brain]]

==News==
[http://yager-research.ca/category/news/ Newsletters released here]. Posts:
===2026===
* [http://yager-research.ca/2026/03/ai-news-2026-03-31/ AI News 2026-03-31]
* [http://yager-research.ca/2026/02/ai-news-2026-02-28/ AI News 2026-02-28]
* [http://yager-research.ca/2026/01/ai-news-2026-01-31/ AI News 2026-01-31]

===2025===
* [http://yager-research.ca/2025/12/ai-news-2025-12-25/ AI News 2025-12-25]
* [http://yager-research.ca/2025/12/ai-news-2025-12-18/ AI News 2025-12-18]
* [http://yager-research.ca/2025/12/ai-news-2025-12-11/ AI News 2025-12-11]
* [http://yager-research.ca/2025/12/ai-news-2025-12-04/ AI News 2025-12-04]
* [http://yager-research.ca/2025/11/ai-news-2025-11-27/ AI News 2025-11-27]
* [http://yager-research.ca/2025/11/ai-news-2025-11-20/ AI News 2025-11-20]
* [http://yager-research.ca/2025/11/ai-news-2025-11-13/ AI News 2025-11-13]
* [http://yager-research.ca/2025/11/ai-news-2025-11-06/ AI News 2025-11-06]
* [http://yager-research.ca/2025/10/ai-news-2025-10-30/ AI News 2025-10-30]
* [http://yager-research.ca/2025/10/ai-news-2025-10-23/ AI News 2025-10-23]
* [http://yager-research.ca/2025/10/ai-news-2025-10-16/ AI News 2025-10-16]
* [http://yager-research.ca/2025/10/ai-news-2025-10-09/ AI News 2025-10-09]
* [http://yager-research.ca/2025/10/ai-news-2025-10-02/ AI News 2025-10-02]
* [http://yager-research.ca/2025/09/ai-news-2025-09-25/ AI News 2025-09-25]
* [http://yager-research.ca/2025/09/ai-news-2025-09-18/ AI News 2025-09-18]
* [http://yager-research.ca/2025/09/ai-news-2025-09-11/ AI News 2025-09-11]
* [http://yager-research.ca/2025/09/ai-news-2025-09-04/ AI News 2025-09-04]
* [http://yager-research.ca/2025/08/ai-news-2025-08-28/ AI News 2025-08-28]
* [http://yager-research.ca/2025/08/ai-news-2025-08-21/ AI News 2025-08-21]
* [http://yager-research.ca/2025/08/ai-news-2025-08-14/ AI News 2025-08-14]
* [http://yager-research.ca/2025/08/ai-news-2025-08-07/ AI News 2025-08-07]
* [http://yager-research.ca/2025/07/ai-news-2025-07-31/ AI News 2025-07-31]
* [http://yager-research.ca/2025/07/ai-news-2025-07-24/ AI News 2025-07-24]
* [http://yager-research.ca/2025/07/ai-news-2025-07-17/ AI News 2025-07-17]
* [http://yager-research.ca/2025/07/ai-news-2025-07-10/ AI News 2025-07-10]
* [http://yager-research.ca/2025/07/ai-news-2025-07-03/ AI News 2025-07-03]
* [http://yager-research.ca/2025/06/ai-news-2025-06-26/ AI News 2025-06-26]
* [http://yager-research.ca/2025/06/ai-news-2025-06-19/ AI News 2025-06-19]
* [http://yager-research.ca/2025/06/ai-news-2025-06-12/ AI News 2025-06-12]
* [http://yager-research.ca/2025/06/ai-news-2025-06-05/ AI News 2025-06-05]
* [http://yager-research.ca/2025/05/ai-news-2025-05-29/ AI News 2025-05-29]
* [http://yager-research.ca/2025/05/ai-news-2025-05-22/ AI News 2025-05-22]
* [http://yager-research.ca/2025/05/ai-news-2025-05-15/ AI News 2025-05-15]
* [http://yager-research.ca/2025/05/ai-news-2025-05-08/ AI News 2025-05-08]
* [http://yager-research.ca/2025/05/ai-news-2025-05-01/ AI News 2025-05-01]
* [http://yager-research.ca/2025/04/ai-news-2025-04-24/ AI News 2025-04-24]
* [http://yager-research.ca/2025/04/ai-news-2025-04-17/ AI News 2025-04-17]
* [http://yager-research.ca/2025/04/ai-news-2025-04-10/ AI News 2025-04-10]
* [http://yager-research.ca/2025/04/ai-news-2025-04-03/ AI News 2025-04-03]
* [http://yager-research.ca/2025/03/ai-news-2025-03-27/ AI News 2025-03-27]
* [http://yager-research.ca/2025/03/ai-news-2025-03-20/ AI News 2025-03-20]
* [http://yager-research.ca/2025/03/ai-news-2025-03-13/ AI News 2025-03-13]
* [http://yager-research.ca/2025/03/ai-news-2025-03-06/ AI News 2025-03-06]
* [http://yager-research.ca/2025/02/ai-news-2025-02-27/ AI News 2025-02-27]
* [http://yager-research.ca/2025/02/ai-news-2025-02-20/ AI News 2025-02-20]
* [http://yager-research.ca/2025/02/ai-news-2025-02-13/ AI News 2025-02-13]
* [http://yager-research.ca/2025/02/ai-news-2025-02-06/ AI News 2025-02-06]
* [http://yager-research.ca/2025/01/ai-news-2025-01-30/ AI News 2025-01-30]
* [http://yager-research.ca/2025/01/ai-news-2025-01-23/ AI News 2025-01-23]
* [http://yager-research.ca/2025/01/ai-news-2025-01-16/ AI News 2025-01-16]
* [http://yager-research.ca/2025/01/ai-news-2025-01-09/ AI News 2025-01-09]
* [http://yager-research.ca/2025/01/ai-new-2025-01-02/ AI News 2025-01-02]

===2024===
* [http://yager-research.ca/2024/12/ai-news-2024-12-26/ AI News 2024-12-26]
* [http://yager-research.ca/2024/12/ai-news-2024-12-19/ AI News 2024-12-19]
* [http://yager-research.ca/2024/12/ai-news-2024-12-12/ AI News 2024-12-12]
* [http://yager-research.ca/2024/12/ai-news-2024-12-05/ AI News 2024-12-05]
* [http://yager-research.ca/2024/11/ai-news-2024-11-28/ AI News 2024-11-28]
* [http://yager-research.ca/2024/11/ai-news-2024-11-21/ AI News 2024-11-21]
* [http://yager-research.ca/2024/11/ai-news-2024-11-14/ AI News 2024-11-14]
* [http://yager-research.ca/2024/11/ai-new-2024-11-07/ AI News 2024-11-07]
* [http://yager-research.ca/2024/10/ai-news-2024-10-31/ AI News 2024-10-31]
* [http://yager-research.ca/2024/10/ai-news-2024-10-24/ AI News 2024-10-24]
* [http://yager-research.ca/2024/10/ai-news-2024-10-17/ AI News 2024-10-17]
* [http://yager-research.ca/2024/10/ai-news-2024-10-10/ AI News 2024-10-10]
* [http://yager-research.ca/2024/10/ai-news-2024-10-03/ AI News 2024-10-03]
* [http://yager-research.ca/2024/09/ai-news-2024-09-26/ AI News 2024-09-26]
* [http://yager-research.ca/2024/09/ai-news-2024-09-19/ AI News 2024-09-19]
* [http://yager-research.ca/2024/09/ai-news-2024-09-12/ AI News 2024-09-12]
* [http://yager-research.ca/2024/09/ai-news-2024-09-05/ AI News 2024-09-05]
* [http://yager-research.ca/2024/08/ai-news-2024-08-29/ AI News 2024-08-29]
* [http://yager-research.ca/2024/08/ai-news-2024-08-22/ AI News 2024-08-22]
* [http://yager-research.ca/2024/08/can-we-distinguish-human-from-ai/ 2024-08-16: Can we Distinguish Human from AI?]
* [http://yager-research.ca/2024/08/ai-news-2024-08-15/ AI News 2024-08-15]
* [http://yager-research.ca/2024/08/ai-news-2024-08-08/ AI News 2024-08-08]
* [http://yager-research.ca/2024/08/ai-news-2024-08-01/ AI News 2024-08-01]
* [http://yager-research.ca/2024/07/ai-news-2024-07-25/ AI News 2024-07-25]
* [http://yager-research.ca/2024/07/ai-news-2024-07-18/ AI News 2024-07-18]
* [http://yager-research.ca/2024/07/ai-news-2024-07-11/ AI News 2024-07-11]
* [http://yager-research.ca/2024/07/ai-news-2024-07-04/ AI News 2024-07-04]
* [http://yager-research.ca/2024/06/ai-news-2024-06-27/ AI News 2024-06-27]
* [http://yager-research.ca/2024/06/ai-news-2024-06-14/ AI News 2024-06-14]
* [http://yager-research.ca/2024/06/situational-awareness/ 2024-06-09: Leopold Aschenbrenner's Situational Awareness]
* [http://yager-research.ca/2024/06/ai-news-2024-06-06/ AI News 2024-06-06]

AI compute

2026-03-31T15:04:01Z

KevinYager: /* Energy Use */

=Analysis=
* 2024-03: [https://arxiv.org/abs/2403.14123 AI and Memory Wall]

=Cloud GPU=
* [https://lambdalabs.com/ Lambda]
* [https://vast.ai/ Vast AI]
* [https://lightning.ai/ Lightning AI]
* [https://www.runpod.io/ RunPod]
* [https://hpc-ai.com/ HPC-AI]

=Cloud Training Compute=
* [https://nebius.ai/ Nebius AI]
* [https://glaive.ai/ Glaive AI]

=Cloud LLM Routers & Inference Providers=
* [https://openrouter.ai/ OpenRouter] (open and closed models, no Enterprise tier)
* [https://www.litellm.ai/ LiteLLM] (closed models, Enterprise tier)
* [https://centml.ai/ Cent ML] (open models, Enterprise tier)
* [https://fireworks.ai/ Fireworks AI] (open models, Enterprise tier)
* [https://abacus.ai/ Abacus AI] (open and closed models, Enterprise tier)
* [https://portkey.ai/ Portkey] (open? and closed models, Enterprise tier)
* [https://www.together.ai/ Together AI] (open models, Enterprise tier)
* [https://hyperbolic.xyz/ Hyperbolic AI] (open models, Enterprise tier)
* Huggingface [https://huggingface.co/blog/inference-providers Inference Providers Hub]
* [https://www.asksage.ai/ AskSage]
* [https://opencode.ai/docs/zen/ Opencode Zen] (for coding agents)

==Multi-model with Model Selection==
* [https://www.notdiamond.ai/ Not Diamond ¬⋄]
* [https://withmartian.com/ Martian]

==Multi-model Web Chat Interfaces==
* [https://simtheory.ai/ SimTheory]
* [https://abacus.ai/ Abacus AI] [https://chatllm.abacus.ai/ ChatLLM]
* [https://poe.com/about Poe]
* [https://gab.ai/ Gab AI]
* [https://www.vectal.ai/login Vectal] ?
* [https://www.blackbox.ai/ BlackboxAI]

==Multi-model Web Playground Interfaces==
* [https://www.together.ai/ Together AI]
* [https://hyperbolic.xyz/ Hyperbolic AI]

=Local Router=
* [https://ollama.com/ Ollama]
* [https://github.com/mudler/LocalAI LocalAI]
* [https://github.com/AK391/ai-gradio ai-gradio]: unified model interface (based on [https://www.gradio.app/ gradio])

=Acceleration Hardware=
* [https://www.nvidia.com/ Nvidia] GPUs
* Google [https://en.wikipedia.org/wiki/Tensor_Processing_Unit TPU]
* [https://www.etched.com/ Etched]: Transformer ASICs
* [https://cerebras.ai/ Cerebras]
* [https://www.untether.ai/ Untether AI]
* [https://www.graphcore.ai/ Graphcore]
* [https://sambanova.ai/ SambaNova Systems]
* [https://groq.com/ Groq]
* Tesla [https://en.wikipedia.org/wiki/Tesla_Dojo Dojo]
* [https://deepsilicon.com/ Deep Silicon]: Combined hardware/software solution for accelerated AI ([https://x.com/sdianahu/status/1833186687369023550 e.g.] ternary math)

=Energy Use=
* 2021-04: [https://arxiv.org/abs/2104.10350 Carbon Emissions and Large Neural Network Training]
* 2023-10: [https://arxiv.org/abs/2310.03003 From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference]
* 2024-01: [https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08-952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf Electricity 2024: Analysis and forecast to 2026]
* 2024-02: [https://www.nature.com/articles/s41598-024-54271-x The carbon emissions of writing and illustrating are lower for AI than for humans]
* 2025-04: [https://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about Why using ChatGPT is not bad for the environment - a cheat sheet]
** A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; [https://docs.google.com/document/d/1pDdpPq3MyPdEAoTkho9YABZ0NBEhBH2v4EA98fm3pXQ/edit?usp=sharing examples of 3 Wh energy usage])
** Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.
* 2025-07: Mistral: [https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai Our contribution to a global environmental standard for AI]
* 2025-08: [https://services.google.com/fh/files/misc/measuring_the_environmental_impact_of_delivering_ai_at_google_scale.pdf Measuring the environmental impact of delivering AI at Google Scale] ([https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference blog])
* 2026-01: [https://epoch.ai/data-insights/grok-4-training-resources What did it take to train Grok 4?]

==Examples==
* '''LLM query'''
** 3 kW * 4s = 3 Wh = 11 kJ
* '''Human brain'''
** 20 W * 8h = 106 Wh
** 20 W * 1h = 20 Wh
** 20 W * 10m = 3 Wh = 10 kJ
* '''Human brain excess thinking'''
** 2 W * 8h = 11 Wh
** 2 W * 1.7h = 3 Wh
* '''Regular computer'''
** 200 W * 8h = 1,600 Wh = 5,700 kJ
** 200 W * 1m = 3 Wh = 10kJ

==Water Use==
* [https://andymasley.substack.com/p/the-ai-water-issue-is-fake The AI water issue is fake. On the national, local, and personal level.]

==Heat Exhaust==
* 2026-03: [https://blog.andymasley.com/p/data-centers-heat-exhaust-is-not Data centers' heat exhaust is not raising the land temperature around where they're built]

AI video

2026-03-30T18:10:40Z

KevinYager: /* March 2026 */

AI video

2026-03-30T18:10:19Z

KevinYager: /* March 2026 */

Talk:AI video

2026-03-30T16:39:01Z

KevinYager: /* Others for Consideration */

AI safety

2026-03-30T16:37:15Z

KevinYager: /* Status */

=Learning Resources=
==Light==
* [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
** [https://interactive.keepthefuturehuman.ai/ Interactive Explainer]
** [https://keepthefuturehuman.ai/essay/ Essay: Keep the Future Human]
** [https://www.youtube.com/watch?v=27KDl2uPiL8 We Can’t Stop AI – Here’s What To Do Instead] (4m video, 2025)
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late] (15m video, 2025)
* Tristan Harris TED talk (15m): [https://www.ted.com/talks/tristan_harris_why_ai_is_our_ultimate_test_and_greatest_invitation Why AI is our ultimate test and greatest invitation]
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
* [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
* 2024-10: [https://www.youtube.com/watch?v=xfMQ7hzyFW4 Writing Doom]: short film on Superintelligence (27m video)
* 2026-03: [https://www.youtube.com/watch?v=Nl7-bRFSZBs The AI book that's freaking out national security advisors] (44m video)

==Deep==
* [https://www.thecompendium.ai/ The Compendium: Humanity risks extinction from its very creations — AIs.] (2024)
* [https://www.aisafetybook.com/ Introduction to AI Safety, Ethics, and Society] (Dan Hendrycks, [https://www.safe.ai/ Center for AI Safety])
* [https://aisafety.info/ AI Safety FAQ]
* [https://deepmindsafetyresearch.medium.com/introducing-our-short-course-on-agi-safety-1072adb7912c DeepMind short course on AGI safety]

=Description of Safety Concerns=
==Key Concepts==
* [https://en.wikipedia.org/wiki/Instrumental_convergence Instrumental Convergence]
* [https://www.lesswrong.com/w/orthogonality-thesis Orthogonality Thesis]
* [https://www.alignmentforum.org/posts/SzecSPYxqRa5GCaSF/clarifying-inner-alignment-terminology Inner/outer alignment]
* [https://www.alignmentforum.org/w/mesa-optimization Mesa-optimization]
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
* 80,000 hours:
** [https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/ Risks from power-seeking AI systems]
** [https://80000hours.org/problem-profiles/gradual-disempowerment/ Gradual disempowerment]
** [https://80000hours.org/problem-profiles/catastrophic-ai-misuse/ Catastrophic AI misuse]

==Medium-term Risks==
* 2023-04: [https://www.youtube.com/watch?v=xoVJKj8lcNQ A.I. Dilemma – Tristan Harris and Aza Raskin” (video)] ([https://assets-global.website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript]): raises concern about human ability to handle these transformations
* 2023-04: [https://www.youtube.com/watch?v=KCSsKV5F4xc Daniel Schmachtenberger and Liv Boeree (video)]: AI could accelerate perverse social dynamics
* 2023-10: [https://arxiv.org/pdf/2310.11986 Sociotechnical Safety Evaluation of Generative AI Systems] (Google DeepMind)
* 2024-02: [https://yoshuabengio.org/2024/02/26/towards-a-cautious-scientist-ai-with-convergent-safety-bounds/ Towards a Cautious Scientist AI with Convergent Safety Bounds] (Yoshua Bengio)
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-06: [https://arxiv.org/abs/2506.20702 The Singapore Consensus on Global AI Safety Research Priorities]
* 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
* 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
* 2026-02: [https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk Updated thoughts on AI risk: Things have gotten scarier since 2023] ([https://x.com/Noahpinion Noah Smith])

==Long-term (x-risk)==
* 2015-02: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-1 Machine intelligence, part 1]
* 2019-03: Daniel Kokotajlo and Wei Dai: [https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?]
* 2022-06: Eliezer Yudkowsky: [https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities List AGI Ruin: A List of Lethalities]
* 2024-11: Marcus Arvan: [https://link.springer.com/article/10.1007/s00146-024-02113-9 ‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for]
* 2025-04: [https://michaelnotebook.com/xriskbrief/index.html ASI existential risk: reconsidering alignment as a goal]
* 2025-12: Philip Trammell and Leopold Aschenbrenner: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth]

=Status=
* 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
* [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)
* 2026-03: [https://windowsontheory.org/2026/03/30/the-state-of-ai-safety-in-four-fake-graphs/ The state of AI safety in four fake graphs]

==Assessmment==
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
* 2025-07: [https://arxiv.org/abs/2507.16534 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report]

==Policy==
* 2024-07: [https://arxiv.org/abs/2407.05694 On the Limitations of Compute Thresholds as a Governance Strategy] Sara Hooker
* 2024-07: [https://www.cigionline.org/static/documents/AI-challenges.pdf Framework Convention on Global AI Challenges] ([https://www.cigionline.org/ CIGI])
* 2024-08: NIST guidelines: [https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf Managing Misuse Risk for Dual-Use Foundation Models]

==Proposals==
* 2025-02: [https://arxiv.org/abs/2502.18359 Responsible AI Agents]
* 2025-03: [https://controlai.com/ Control AI] [https://controlai.com/dip The Direct Institutional Plan]
* 2025-04: Google DeepMind: [https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/ Taking a responsible path to AGI]
** Paper: [https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdf An Approach to Technical AGI Safety and Security]

=Research=
* 2008: [https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf The Basic AI Drives]
* 2022-09: [https://arxiv.org/abs/2209.00626v1 The alignment problem from a deep learning perspective]
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision]
* 2023-02: [https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences]
* 2023-04: [https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark]
* 2023-05: [https://arxiv.org/abs/2305.15324 Model evaluation for extreme risks] (DeepMind)
* 2023-05: [https://arxiv.org/abs/2305.03047 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision]
* 2023-06: [https://arxiv.org/abs/2306.17492 Preference Ranking Optimization for Human Alignment]
* 2023-08: [https://arxiv.org/abs/2308.06259 Self-Alignment with Instruction Backtranslation]
* 2023-11: [https://arxiv.org/abs/2311.08702 Debate Helps Supervise Unreliable Experts]
* 2023-12: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision] (OpenAI, [https://openai.com/research/weak-to-strong-generalization blog])
* 2023-12: [https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf Practices for Governing Agentic AI Systems] (OpenAI, [https://openai.com/index/practices-for-governing-agentic-ai-systems/ blog])
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist through Safety Training] (Anthropic)
* 2024-04: [https://arxiv.org/abs/2404.13208 The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions] (OpenAI)
* 2024-07: [https://arxiv.org/abs/2407.04622 On scalable oversight with weak LLMs judging strong LLMs]
* 2024-07: [https://arxiv.org/abs/2407.21792 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?] (Dan Hendrycks et al.)
* 2024-08: [https://arxiv.org/abs/2408.00761 Tamper-Resistant Safeguards for Open-Weight LLMs] ([https://www.tamper-resistant-safeguards.com/ project], [https://github.com/rishub-tamirisa/tamper-resistance/ code])
* 2024-08: [https://arxiv.org/abs/2408.04614 Better Alignment with Instruction Back-and-Forth Translation]
* 2024-10: [https://cdn.openai.com/papers/first-person-fairness-in-chatbots.pdf First-Person Fairness in Chatbots] (OpenAI, [https://openai.com/index/evaluating-fairness-in-chatgpt/ blog])
* 2024-10: [https://assets.anthropic.com/m/377027d5b36ac1eb/original/Sabotage-Evaluations-for-Frontier-Models.pdf Sabotage evaluations for frontier models] (Anthropic, [https://www.anthropic.com/research/sabotage-evaluations blog])
* 2024-12: [https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf Alignment Faking in Large Language Models] (Anthropic)
* 2024-12: [https://arxiv.org/abs/2412.03556 Best-of-N Jailbreaking] ([https://github.com/jplhughes/bon-jailbreaking code])
* 2024-12: [https://arxiv.org/abs/2412.16325 Towards Safe and Honest AI Agents with Neural Self-Other Overlap]
** 2024-07: [https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment Self-Other Overlap: A Neglected Approach to AI Alignment]
** 2025-03: [https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine Reducing LLM deception at scale with self-other overlap fine-tuning]
* 2024-12: [https://arxiv.org/abs/2412.16339 Deliberative Alignment: Reasoning Enables Safer Language Models] (OpenAI)
* 2025-01: [https://cdn.openai.com/papers/trading-inference-time-compute-for-adversarial-robustness-20250121_1.pdf Trading Inference-Time Compute for Adversarial Robustness] (OpenAI, [https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/ blog])
* 2025-01: [https://arxiv.org/abs/2501.18837 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming] (Anthropic, [https://www.anthropic.com/research/constitutional-classifiers blog],
* 2025-02: [https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs] ([https://www.emergent-values.ai/ site], [https://github.com/centerforaisafety/emergent-values github])
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
* 2025-02: [https://arxiv.org/abs/2502.14143 Multi-Agent Risks from Advanced AI]
* 2025-03: [https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
* 2025-03: [https://assets.anthropic.com/m/317564659027fb33/original/Auditing-Language-Models-for-Hidden-Objectives.pdf Auditing language models for hidden objectives] (Anthropic, [https://www.anthropic.com/research/auditing-hidden-objectives blog])
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
* 2025-07: [https://arxiv.org/abs/2507.11473 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety]
* 2025-09: [https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/ Detecting and reducing scheming in AI models]
* 2025-11: [https://assets.anthropic.com/m/74342f2c96095771/original/Natural-emergent-misalignment-from-reward-hacking-paper.pdf Natural Emergent Misalignment from Reward Hacking in Production RL] (Anthropic, [https://www.anthropic.com/research/emergent-misalignment-reward-hacking blog])
* 2025-12: [https://arxiv.org/abs/2512.16856 Distributional AGI Safety]
* 2025-12: [https://arxiv.org/abs/2511.22662 Difficulties with Evaluating a Deception Detector for AIs]
* 2025-12: [https://cdn.openai.com/pdf/d57827c6-10bc-47fe-91aa-0fde55bd3901/monitoring-monitorability.pdf Monitoring Monitorability] (OpenAI)
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
* 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]

==Demonstrations of Negative Use Capabilities==
* 2024-12: [https://arxiv.org/abs/2412.00586 Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects]
* 2025-04: [https://www.nathanlabenz.com/ Nathan Labenz] ([https://www.cognitiverevolution.ai/ The Cognitive Revolution]): [https://docs.google.com/presentation/d/1mvkpg1mtAvGzTiiwYPc6bKOGsQXDIwMb-ytQECb3i7I/edit#slide=id.g252d9e67d86_0_16 AI Bad Behavior]

==Threat Vectors==
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]
* 2025-10: [https://arxiv.org/abs/2510.07192 Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples]

=See Also=
* [[AI predictions]]

Science Agents

2026-03-30T15:05:29Z

KevinYager: /* Literature */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models]

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Personalities==
* 2026-03: [https://github.com/msitarzewski/agency-agents The Agency: AI Specialists Ready to Transform Your Workflow]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI tools

2026-03-26T20:03:15Z

KevinYager: /* Open Source */

=LLM=
==Open-weights LLM==
* [https://about.fb.com/news/2023/07/llama-2/ 2023-07Jul-18]: [https://llama.meta.com/llama2/ Llama2] 7B, 13B, 70B
* [https://ai.meta.com/blog/meta-llama-3/ 2024-04Apr-18]: [https://llama.meta.com/llama3/ Llama3] 8B, 70B
* [https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/ 2024-06Jun-14]: [https://research.nvidia.com/publication/2024-06_nemotron-4-340b Nemotron-4] 340B
* 2024-07Jul-23: [https://llama.meta.com/ Llama 3.1] 8B, 70B, 405B
* [https://mistral.ai/news/mistral-large-2407/ 2024-07Jul-24]: [https://huggingface.co/mistralai/Mistral-Large-Instruct-2407 Mistral Large 2] 128B
* [https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma/ 2024-07Jul-31]: [https://huggingface.co/collections/google/gemma-2-2b-release-66a20f3796a2ff2a7c76f98f Gemma 2] 2B
* [https://qwenlm.github.io/blog/qwen2-math/ 2024-08Aug-08]: Qwen2-Math ([https://huggingface.co/collections/Qwen/qwen2-math-66b4c9e072eda65b5ec7534d hf], [https://github.com/QwenLM/Qwen2-Math github]) 1.5B, 7B, 72B
* [https://nousresearch.com/releases/ 2024-08Aug-14]: [https://nousresearch.com/ Nous research] [https://nousresearch.com/hermes3/ Hermes 3] ([https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf technical report]) 8B, 70B, 405B
* 2024-08Aug-19: [https://www.salesforceairesearch.com/ Salesforce AI] [https://huggingface.co/papers/2408.08872 xGen-MM (BLIP-3)]: A Family of Open Large Multimodal Models ([https://www.arxiv.org/abs/2408.08872 preprint], [https://github.com/salesforce/LAVIS/tree/xgen-mm code])
* 2024-09Sep-04: [https://arxiv.org/abs/2409.02060 OLMoE: Open Mixture-of-Experts Language Models] ([https://github.com/allenai/OLMoE code]) 7B model (uses 1B per input token)
* 2024-09Sep-05: [https://huggingface.co/mattshumer/Reflection-70B Reflection 70B] ([https://reflection-playground-production.up.railway.app/ demo]): [https://x.com/mattshumer_/status/1831767014341538166 Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.]
* 2024-09Sep-06: [https://huggingface.co/deepseek-ai/DeepSeek-V2.5 DeepSeek-V2.5] 238B mixture-of-experts (160 experts, 16B active params)
* 2024-09Sep-19: Microsoft GRadient-INformed (GRIN) MoE ([https://huggingface.co/spaces/GRIN-MoE-Demo/GRIN-MoE demo], [https://huggingface.co/microsoft/GRIN-MoE model], [https://github.com/microsoft/GRIN-MoE github]) 6.6B
* 2024-09Sep-23: Nvidia [https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct Llama-3_1-Nemotron-51B-instruct] 51B
* 2024-09Sep-25: Meta [https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ Llama 3.2] with visual and voice modalities 1B, 3B, 11B, 90B
* 2024-09Sep-25: [https://allenai.org/ Ai2] [https://molmo.allenai.org/ Molmo] [https://molmo.allenai.org/blog multi-modal models] 1B, 7B, 72B
* 2024-10Oct-01: Nvidia [https://huggingface.co/nvidia/NVLM-D-72B NVLM-D-72B] (includes vision)
* [https://mistral.ai/news/ministraux/ 2024-10Oct-16]: Mistral [https://huggingface.co/mistralai/Ministral-8B-Instruct-2410 Ministral-8B-Instruct-2410]
* 2024-10Oct-16: Nvidia [https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward-HF Llama-3.1-Nemotron-70B-Reward]
* 2024-11Nov-04: [https://arxiv.org/abs/2411.02265 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent] 389B ([https://github.com/Tencent/Tencent-Hunyuan-Large code], [https://huggingface.co/tencent/Tencent-Hunyuan-Large weights])
* 2024-11Nov-18: [https://huggingface.co/mistralai/Mistral-Large-Instruct-2411 Mistral-Large-Instruct-2411]) 123B; and [https://mistral.ai/news/pixtral-large/ Pixtral Large] multimodal model 124B ([https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 weights])
* 2024-11Nov-22: Nvidia [https://github.com/NVlabs/hymba Hymba] ([https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/ blog]): small and high-performance
* 2024-12Dec-06: Meta [https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct Llama 3.3] 70B
* [https://x.com/deepseek_ai/status/1872242657348710721 2024-12Dec-26]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-Base DeepSeek-V3-Base] 671B
* 2025-01Jan-02: [https://huggingface.co/PowerInfer/SmallThinker-3B-Preview SmallThinker-3B-Preview] (fine-tune of [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct Qwen2.5-3b-Instruct])
* [https://x.com/SebastienBubeck/status/1877010995727470877 2025-01Jan-08]: Microsoft [https://huggingface.co/microsoft/phi-4 phi-4] 15B
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01Jan-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01], MiniMax-Text-01 and MiniMax-VL-01; 4M context length ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
* 2025-01Jan-27: [https://qwenlm.github.io/blog/qwen2.5-1m/ Qwen2.5-1M] ([https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf report])
* 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
* [https://x.com/cohere/status/1900170005519753365 2025-03Mar-14]: Cohere [https://cohere.com/blog/command-a Command A] ([https://huggingface.co/CohereForAI/c4ai-command-a-03-2025?ref=cohere-ai.ghost.io weights])
* [https://x.com/MistralAI/status/1901668499832918151 2025-03Mar-17]: [https://mistral.ai/news/mistral-small-3-1 Mistral Small 3.1] 24B ([https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 weights])
* [https://x.com/deepseek_ai/status/1904526863604883661 2025-03Mar-24]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 DeepSeek-V3-0324] 685B
* 2025-04Apr-05: Meta [https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4] (109B, 400B, 2T)
* [https://x.com/kuchaev/status/1909444566379573646 2025-04Apr-08]: Nvidia [https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Llama-3_1-Nemotron-Ultra-253B-v1]
* [https://x.com/MistralAI/status/1920119463430500541 2025-05May-07]: Mistral [https://mistral.ai/news/mistral-medium-3 Medium 3]
* [https://x.com/googleaidevs/status/1938279967026274383 2025-06Jun-26]: Google [https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/ Gemma 3n] (on-device multimodal)
* [https://x.com/Alibaba_Qwen/status/1953128028047102241 2025-08Aug-06]: [https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507 Qwen3-4B-Instruct-2507]
* [https://x.com/GoogleDeepMind/status/1956393664248271082 2025-08Aug-15]: Google [https://developers.googleblog.com/en/introducing-gemma-3-270m/ Gemma 3 270M]
* [https://x.com/arcee_ai/status/2016278017572495505?s=20 2026-01Jan-28]: [https://www.arcee.ai/ Arcee AI] [https://docs.arcee.ai/get-started/models-overview Trinity Large] [https://huggingface.co/arcee-ai 400B]

===Coding===
Rankings: [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard bigcode-models-leaderboard] and [https://codeelo-bench.github.io/#leaderboard-table CodeElo leaderboard]
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
* 2024-11Nov-09: [https://opencoder-llm.github.io/ OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models] ([https://huggingface.co/collections/infly/opencoder-672cec44bbb86c39910fb55e weights], [https://arxiv.org/abs/2411.04905 preprint])
* 2024-11Nov-13: [https://qwenlm.github.io/blog/qwen2.5-coder-family/ Qwen2.5-Coder]
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
* [https://x.com/GeZhang86038849/status/1921147887871742329 2025-05May-10]: ByteDance [https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Base SeedCoder] 8B
* [https://x.com/Kimi_Moonshot/status/1943687594560332025 2025-07Jul-11]: [https://moonshotai.github.io/Kimi-K2/ Kimi-K2] 1T ([https://github.com/MoonshotAI/Kimi-K2 code], [https://huggingface.co/moonshotai weights])
* [https://x.com/Alibaba_Qwen/status/1947766835023335516 2025-07Jul-23]: [https://qwenlm.github.io/blog/qwen3-coder/ Qwen3-Coder-480B-A35B-Instruct] ([https://github.com/QwenLM/qwen-code code], [https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct weights])
* [https://x.com/MiniMax_AI/status/2021980761210134808?s=20 2026-02Feb-12]: [https://www.minimax.io/news/minimax-m25 MiniMax M2.5] 230B

===Reasoning===
See also: [[Increasing_AI_Intelligence|Increasing AI Intelligence]] > Proactive Search > [[Increasing_AI_Intelligence#CoT_reasoning_model|CoT reasoning model]]
* [https://x.com/deepseek_ai/status/1859200141355536422 2024-11Nov-20]: DeepSeek-R1-Lite-Preview ([https://x.com/deepseek_ai/status/1859200145037869485 results], [https://x.com/teortaxesTex/status/1859259359630356955 CoT])
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
* 2024-11Nov-27: [https://qwenlm.github.io/blog/qwq-32b-preview/ Alibaba Qwen QwQ] 32B ([https://huggingface.co/Qwen/QwQ-32B-Preview model], [https://huggingface.co/spaces/Qwen/QwQ-32B-preview demo])
* [https://x.com/ruliad_ai/status/1864394941029322890 2024-12Dec-04]: [https://www.ruliad.co/ Ruliad] [https://huggingface.co/ruliad/deepthought-8b-llama-v0.01-alpha Deepthought] 8B ([https://chat.ruliad.co/ demo])
* 2024-12Dec-24: Qwen [https://huggingface.co/Qwen/QVQ-72B-Preview QvQ-72B-preview] (visual reasoning)
* 2025-01Jan-10: [https://mbzuai-oryx.github.io/LlamaV-o1/ LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs] ([https://arxiv.org/abs/2501.06186 preprint], [https://github.com/mbzuai-oryx/LlamaV-o1 code], [https://huggingface.co/omkarthawakar/LlamaV-o1 weights])
* [https://x.com/deepseek_ai/status/1881318130334814301 2025-01Jan-20]: [https://huggingface.co/deepseek-ai/DeepSeek-R1 DeepSeek-R1], [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B DeepSeek-R1-Distill-Llama-70B], DeepSeek-R1-Distill-Qwen-32B, ... ([https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf paper])
* 2025-02Feb-10: [https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125]: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://github.com/seal-rg/recurrent-pretraining code], [https://huggingface.co/tomg-group-umd/huginn-0125 model])
* [https://x.com/NousResearch/status/1890148000204485088 2025-02Feb-14]: [https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview DeepHermes 3 - Llama-3.1 8B]
* [https://x.com/Alibaba_Qwen/status/1894130603513319842 2025-02Feb-24]: Qwen [https://qwenlm.github.io/blog/qwq-max-preview/ QwQ-Max-Preview] ([https://chat.qwen.ai/ online demo])
* [https://x.com/Alibaba_Qwen/status/1897361654763151544 2025-03Mar-05]: Qwen [https://qwenlm.github.io/blog/qwq-32b/ QwQ-32B] ([https://huggingface.co/spaces/Qwen/QwQ-32B-Demo demo])
* [https://x.com/BlinkDL_AI/status/1898579674575552558 2025-03Mar-05]: [https://github.com/BlinkDL/RWKV-LM RWKV7-G1] "GooseOne" 0.1B ([https://huggingface.co/BlinkDL/rwkv7-g1 weights], [https://arxiv.org/abs/2305.13048 preprint])
* [https://x.com/LG_AI_Research/status/1901803002052436323 2025-03Mar-17]: LG AI Research [https://www.lgresearch.ai/blog/view?seq=543 EXAONE Deep] 2.4B, 7.8B, 32B ([https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B weights])
* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
* 2025-04Apr-10: Bytedance [https://github.com/ByteDance-Seed/Seed-Thinking-v1.5 Seed-Thinking-v1.5] 200B
* [https://x.com/ZyphraAI/status/1910362745423425966 2025-04Apr-11]: [https://www.zyphra.com/ Zyphra] [https://www.zyphra.com/post/introducing-zr1-1-5b-a-small-but-powerful-math-code-reasoning-model ZR1-1.5B] ([https://huggingface.co/Zyphra/ZR1-1.5B weights], [https://playground.zyphra.com/sign-in use])
* [https://x.com/Alibaba_Qwen/status/1916962087676612998 2025-04Apr-29]: [https://qwenlm.github.io/blog/qwen3/ Qwen3] 0.6B to 235B ([https://github.com/QwenLM/Qwen3 code], [https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f weights], [https://modelscope.cn/home modelscope])
* [https://x.com/DimitrisPapail/status/1917731614899028190 2025-04Apr-30]: [https://huggingface.co/microsoft/Phi-4-reasoning Phi-4 Reasoning] 14B ([https://www.microsoft.com/en-us/research/wp-content/uploads/2025/04/phi_4_reasoning.pdf tech report])
* [https://x.com/deepseek_ai/status/1928061589107900779 2025-05May-28]: [https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 DeepSeek-R1-0528]
* [https://x.com/MistralAI/status/1932441507262259564 2025-06Jun-10]: Mistral [https://mistral.ai/static/research/magistral.pdf Magistral] 24B ([https://huggingface.co/mistralai/Magistral-Small-2506 weights])
* [https://x.com/LoubnaBenAllal1/status/1942614508549333211 2025-07Jul-08]: [https://huggingface.co/blog/smollm3 SmolLM3]: smol, multilingual, long-context reasoner
* [https://x.com/OpenAI/status/1952776916517404876 2025-08Aug-05]: [https://openai.com/open-models/ OpenAI] gpt-oss-120b, gpt-oss-20b
* [https://x.com/Alibaba_Qwen/status/1953128028047102241 2025-08Aug-06]: [https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507 Qwen3-4B-Thinking-2507]
* 2025-09Sep: [https://huggingface.co/LLM360/K2-Think K2-Think] 32B
* [https://x.com/Kimi_Moonshot/status/1986449512538513505 2025-11Nov]: [https://moonshotai.github.io/Kimi-K2/thinking.html Kimi K2 Thinking] 1T (32B active)
* [https://x.com/deepseek_ai/status/1995452641430651132?s=20 2025-12Dec]: [https://huggingface.co/deepseek-ai/DeepSeek-V3.2 DeepsSeek-v3.2] and [https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale DeepSeek-v3.2-Speciale]

===Agentic===
* 2025-02Feb-18: Microsoft [https://huggingface.co/microsoft/Magma-8B Magma-8B] ([https://www.arxiv.org/abs/2502.13130 preprint])
* 2025-02Feb-26: [https://convergence.ai/ Convergence] [https://github.com/convergence-ai/proxy-lite Proxy Lite]
* [https://x.com/MiniMax_AI/status/2021980761210134808?s=20 2026-02Feb-12]: [https://www.minimax.io/news/minimax-m25 MiniMax M2.5] 230B

===Multimodal===
====Language/Vision====
* [https://arxiv.org/abs/2407.07895 LLaVA-NeXT-Interleave] ([https://huggingface.co/collections/llava-hf/llava-interleave-668e19a97da0036aad4a2f19 models], [https://huggingface.co/spaces/merve/llava-interleave demo])
* [https://huggingface.co/papers/2407.15841 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models]
* Nvidia [https://huggingface.co/collections/merve/nveagle-66d0705108582d73bb235c26 NVEagle] 13B, 7B ([https://huggingface.co/spaces/NVEagle/Eagle-X5-13B-Chat demo], [https://arxiv.org/abs/2408.15998 preprint])
* 2024-08Aug-29: [https://qwenlm.github.io/blog/qwen2-vl/ Qwen2-VL] 7B, 2B ([https://github.com/QwenLM/Qwen2-VL code], [https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d models]): Can process videos up to 20 minutes in length
* 2024-09Sep-11: Mistral [https://huggingface.co/mistral-community/pixtral-12b-240910 Pixtral 12B]
* 2024-09Sep-17: [https://nvlm-project.github.io/ NVLM 1.0]
* 2024-12Dec-06: Nvidia [https://arxiv.org/abs/2412.04468 NVILA: Efficient Frontier Visual Language Models]
* [https://x.com/Alibaba_Qwen/status/1883954247743725963 2025-01Jan-28]: [https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5 Qwen2.5-VL]
* 2025-02Feb-18: Microsoft [https://huggingface.co/microsoft/Magma-8B Magma-8B] ([https://www.arxiv.org/abs/2502.13130 preprint])
* [https://x.com/CohereForAI/status/1896923657470886234 2025-03Mar-05]: Cohere [https://cohere.com/research/aya Aya] 8B, 32B
* 2025-03Mar-12: Google [https://developers.googleblog.com/en/introducing-gemma3/ Gemma 3] 1B 4B, 12B, 27B ([https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf technical report])
* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
* 2025-05May-20: ByteDance [https://bagel-ai.org/ BAGEL: Unified Model for Multimodal Understanding and Generation] 7B ([https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT weights], [https://github.com/bytedance-seed/BAGEL code], [https://demo.bagel-ai.org/ demo])

====Language/Vision/Speech====
* 2025-02Feb-27: Microsoft [https://huggingface.co/microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct] (language, vision, speech)
* [https://x.com/kyutai_labs/status/1903082848547906011 2025-03Mar-21]: kyutai [https://kyutai.org/moshivis MoshiVis] ([https://vis.moshi.chat/ demo])
* [https://x.com/Alibaba_Qwen/status/1904944923159445914 2025-03Mar-26]: [https://qwenlm.github.io/blog/qwen2.5-omni/ Qwen2.5-Omni-7B] ([https://github.com/QwenLM/Qwen2.5-Omni/blob/main/assets/Qwen2.5_Omni.pdf tech report], [https://github.com/QwenLM/Qwen2.5-Omni code], [https://huggingface.co/Qwen/Qwen2.5-Omni-7B weight])

====Language/Audio====
* 2025-03Mar-11: [https://github.com/soham97/mellow Mellow]: a small audio language model for reasoning, 167M ([https://arxiv.org/abs/2503.08540 paper])
* 2025-03Mar-12: [https://research.nvidia.com/labs/adlr/AF2/ Audio Flamingo 2] 0.5B, 1.5B, 3B [https://arxiv.org/abs/2503.03983 paper], [https://github.com/NVIDIA/audio-flamingo code]

===RAG===
* 2025-04: [https://huggingface.co/collections/PleIAs/pleias-rag-680a0d78b058fffe4c16724d Pleias-RAG] 350M, 1.2B
** Paper: [http://ragpdf.pleias.fr/ Even Small Reasoners Should Quote Their Sources: Introducing Pleias-RAG Model Family]
* 2025-04: Meta ReasonIR 8B: [https://arxiv.org/abs/2504.20595 ReasonIR: Training Retrievers for Reasoning Tasks]

==Cloud LLM==
* [https://groq.com/ Groq] [https://wow.groq.com/ cloud] (very fast inference)

===Multi-modal: Audio===
* [https://kyutai.org/ kyutai Open Science AI Lab] chatbot [https://www.us.moshi.chat/?queue_id=talktomoshi moshi]

==Triage==
* [https://arxiv.org/abs/2406.18665 RouteLLM: Learning to Route LLMs with Preference Data]

==Retrieval Augmented Generation (RAG)==
* See Also: [[AI_tools#Document_Parsing|Document Parsing]]

===Reviews===
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
* 2024-09: [https://arxiv.org/abs/2409.14924 Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
* 2024-12: [https://arxiv.org/abs/2412.17558 A Survey of Query Optimization in Large Language Models]
* 2025-01: [https://arxiv.org/abs/2501.07391 Enhancing Retrieval-Augmented Generation: A Study of Best Practices]
* 2025-01: [https://arxiv.org/abs/2501.09136 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG] ([https://github.com/asinghcsu/AgenticRAG-Survey github])
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
* [https://github.com/athina-ai/rag-cookbooks Advanced RAG Cookbooks👨🏻‍💻]
* [https://github.com/DEEP-PolyU/Awesome-GraphRAG Awesome-GraphRAG (GraphRAG Survey)]

===Measuring RAG performance===
* 2025-01: [https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models/ The FACTS Grounding Leaderboard]: [https://arxiv.org/abs/2501.03200 Benchmarking LLMs' Ability to Ground Responses to Long-Form Input]

===Analysis of RAG overall===
* 2024-10: [https://arxiv.org/abs/2410.13070 Is Semantic Chunking Worth the Computational Cost?]

===Approaches===
* RAGFlow ([https://github.com/infiniflow/ragflow code])
* GraphRAG ([https://arxiv.org/abs/2404.16130 preprint], [https://github.com/microsoft/graphrag code], [https://github.com/Azure-Samples/graphrag-accelerator GraphRAG Accelerator] for easy deployment on Azure)
* AutoMetaRAG ([https://github.com/darshil3011/AutoMetaRAG/tree/main code])
* [https://verba.weaviate.io/ Verba]: RAG for [https://weaviate.io/ Weaviate] vector database ([https://github.com/weaviate/verba code], [https://www.youtube.com/watch?v=UoowC-hsaf0 video])
* Microsoft: [https://github.com/microsoft/PIKE-RAG PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation]
* 2024-10: Google [https://arxiv.org/abs/2410.07176 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.08815 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization]: Reformats retrieved data into task-appropriate structures (table, graph, tree).
* 2024-10: [https://arxiv.org/abs/2410.13765 Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval]
* 2024-11: [https://www.arxiv.org/abs/2411.13773 FastRAG: Retrieval Augmented Generation for Semi-structured Data]
* 2024-11: Microsoft [https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/ LazyGraphRAG: Setting a new standard for quality and cost]
* 2024-11: [https://arxiv.org/abs/2411.19443 Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2025-01: [https://github.com/Marker-Inc-Korea/AutoRAG AutoRAG: RAG AutoML tool for automatically finding an optimal RAG pipeline for your data]
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
* 2025-02: [https://weaviate.io/developers/weaviate/tutorials/multi-vector-embeddings Multi-vector embeddings]
* 2025-03: [https://arxiv.org/abs/2503.23513 RARE: Retrieval-Augmented Reasoning Modeling]

===Open-source Implementations===
* [https://github.com/Cinnamon/kotaemon kotaemon]: An open-source clean & customizable RAG UI for chatting with your documents.
* [https://www.llamaindex.ai/ LlamaIndex] ([https://github.com/run-llama/llama_index code], [https://docs.llamaindex.ai/en/stable/ docs], [https://github.com/run-llama/voice-chat-pdf voice chat code])
* Nvidia [https://www.nvidia.com/en-us/ai-on-rtx/chatrtx/ ChatRTX] with [https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/ RAG]
* Anthropic [https://github.com/anthropics/anthropic-quickstarts/tree/main/customer-support-agent Customer Support Agent example]
* [https://www.langchain.com/ LangChain] and [https://www.langchain.com/langgraph LangGraph] ([https://www.metadocs.co/2024/08/20/simple-agentic-rag-for-multi-vector-stores-with-langchain-and-langgraph/ tutorial])
** [https://github.com/KruxAI/ragbuilder RAGBuilder]: Automatically tunes RAG hyperparams
* [https://github.com/stanford-oval/WikiChat WikiChat]
** [https://arxiv.org/abs/2305.14292 WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia]
* [https://github.com/bhavnicksm/chonkie Chonkie]: No-nonsense RAG chunking library (open-source, lightweight, fast)
* [https://github.com/pingcap/autoflow autoflow]: open source GraphRAG (Knowledge Graph), including conversational search page
* [https://github.com/superlinear-ai/raglite RAGLite]
* [https://github.com/gusye1234/nano-graphrag nano-graphrag]: A simple, easy-to-hack GraphRAG implementation
* [https://github.com/electricpipelines/barq Dabarqus]

===Web-based Tools===
* [https://typeset.io/ SciSpace] Chat with PDF (also available as a GPT).

===Commercial Cloud Offerings===
* [https://www.graphlit.com/ Graphlit]
* [https://colivara.com/ ColiVara]
* [https://nhost.io/blog/assistants-file-stores nhost]
* [https://vespa.ai/ Vespa] [https://vespa.ai/solutions/enterprise-retrieval-augmented-generation/ RAG]
* [https://unstructured.io/ Unstructured]
* [https://www.fivetran.com/blog/assembling-a-rag-architecture-using-fivetran Fivetran]
* [https://platform.vectorize.io/ Vectorize]
* [https://www.voyageai.com/ Voyage AI]
* [https://abacus.ai/ Abacus AI]
* [https://www.cloudflare.com/ Cloudflare] [https://blog.cloudflare.com/introducing-autorag-on-cloudflare/ AutoRAG]

==LLM for scoring/ranking==
* [https://arxiv.org/abs/2302.04166 GPTScore: Evaluate as You Desire]
* [https://arxiv.org/abs/2306.17563 Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting]
* [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* [https://arxiv.org/abs/2407.02977 Large Language Models as Evaluators for Scientific Synthesis]

=LLM Agents=
* See [[AI Agents]].

=Interfaces=
==Chatbot Frontend==
===Web (code)===
* [https://docs.streamlit.io/develop/tutorials/llms/build-conversational-apps Steamlit]
* [https://docs.cohere.com/v2/docs/cohere-toolkit Cohere Toolkit] ([https://github.com/cohere-ai/cohere-toolkit code])
* [https://www.librechat.ai/ LibreChat]
* [https://github.com/open-webui/open-webui open-webui]
* [https://github.com/xjdr-alt/entropix/tree/main/ui entropix frontend UI]

===Web (product)===
* [https://chatboxai.app/en Chatbox]

===Desktop GUI===
* [https://anythingllm.com/ AnythingLLM] ([https://docs.anythingllm.com/ docs], [https://github.com/Mintplex-Labs/anything-llm code]): includes chat-with-docs, selection of LLM and vector db, etc.

==Alternative Text Chatbot UI==
* [https://generative.ink/posts/loom-interface-to-the-multiverse/ Loom] provides a sort of tree-like structure for LLM coming up with branched writings.
* [https://www.lesswrong.com/posts/JHsfMWtwxBGGTmb8A/pantheon-interface The Pantheon Interface] is a new idea for how to interact with LLMs ([https://pantheon.chat/ live instance], [https://github.com/nickkeesG/Pantheon code]). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

==Conversational Audio Chatbot==
* Swift is a fast AI voice assistant ([https://github.com/ai-ng/swift code], [https://swift-ai.vercel.app/ live demo]) uses:
** [https://groq.com/ Groq] cloud running [https://github.com/openai/whisper OpenAI Whisper] for fast speech transcription.
** [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic] for fast speech synthesis
** [https://www.vad.ricky0123.com/ VAD] to detect when user is talking
** [https://vercel.com/ Vercel] for app deployment
* [https://github.com/rtvi-ai RTVI-AI] ([https://github.com/rtvi-ai/rtvi-web-demo code], [https://demo-gpu.rtvi.ai/ demo]), uses:
** [https://groq.com/ Groq]
** [https://llama.meta.com/ Llama 3.1]
** [https://www.daily.co/ai/ Daily]
** [https://github.com/rtvi-ai RTVI ]
* [https://github.com/mezbaul-h/june June]: Local Voice Chatbot
** [https://ollama.com/ Ollama]
** [https://huggingface.co/docs/transformers/en/tasks/asr Hugging Face Transformers] (for speech recognition)
** [https://github.com/coqui-ai/TTS Coqui TTS Toolkit]
* [https://kyutai.org/ kyutai] Moshi chatbot ([https://us.moshi.chat/ demo])
* [https://arxiv.org/abs/2408.16725 Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming] ([https://huggingface.co/gpt-omni/mini-omni model], [https://github.com/gpt-omni/mini-omni code], [https://huggingface.co/spaces/gradio/omni-mini demo])
* 2024-09Sep-11: [https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni Llama-3.1-8B-Omni] ([https://github.com/ictnlp/LLaMA-Omni code]), enabling end-to-end speech.
* [https://x.com/AIatMeta/status/1847383580269510670 2024-10Oct-18]: Meta [https://speechbot.github.io/spiritlm/ Spirit LM]: open source multimodal language model that freely mixes text and speech
* 2025-02Feb-28: [https://www.sesame.com/ Sesame] ([https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo demo])

===Turn Detection===
* 2025-03: [https://github.com/pipecat-ai/smart-turn Smart Turn]: Open-source

===Related Research===
* [https://arxiv.org/abs/2408.02622 Language Model Can Listen While Speaking]

===Commercial Systems===
* [https://heypi.com/talk HeyPi Talk]
* [https://vapi.ai/ Vapi]
* [https://callannie.ai/ Call Annie]
* [https://www.bland.ai Bland AI]
* [https://deepgram.com/ DeepGram Voice AI]
* [https://www.sesame.com/ Sesame] ([https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo demo])

=Speech Recognition (ASR) and Transcription=
==Lists==
* [https://huggingface.co/spaces/hf-audio/open_asr_leaderboard Open ASR Leaderboard]

==Open Source==
* [https://github.com/mozilla/DeepSpeech DeepSpeech]
* [https://github.com/speechbrain/speechbrain speechbrain]
* [https://github.com/kaldi-asr/kaldi/blob/master/README.md Kaldi]
* wav2vec 2.0
** [https://arxiv.org/abs/2104.01027 Paper: Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training]
* Whisper
** [https://huggingface.co/openai/whisper-medium.en Whisper medium.en]
** [https://github.com/m-bain/whisperX WhisperX] (includes word-level timestamps and speaker diarization)
** [https://huggingface.co/mlx-community/distil-whisper-large-v3 Distil Large v3 with MLX]
** 2024-10: [https://huggingface.co/ylacombe/whisper-large-v3-turbo whisper-large-v3-turbo] distillation ([https://huggingface.co/spaces/hf-audio/whisper-large-v3-turbo demo], [https://github.com/openai/whisper/actions/runs/11111568226 code])
* [https://huggingface.co/spaces/hf-audio/open_asr_leaderboard Nvidia Canary 1B]
* [https://developer.nvidia.com/blog/accelerating-leaderboard-topping-asr-models-10x-with-nvidia-nemo/ 2024-09]: Nvidia [https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/models.html NeMo]
* 2024-10: [https://www.rev.ai/ Rev AI] [https://huggingface.co/Revai models] for [https://huggingface.co/Revai/reverb-asr transcription] and [https://huggingface.co/Revai/reverb-diarization-v2 diarization]
* 2024-10: [https://github.com/usefulsensors/moonshine Moonshine] (optimized for resource-constrained devices)
* 2025-05: [https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 Parakeet TDT 0.6B V2]
* [https://x.com/kyutai_labs/status/1925840420187025892 2025-05]: [https://kyutai.org/ Kyutai] [https://unmute.sh/ Unmute]
* [https://x.com/cohere/status/2037159129345614174?s=20 2026-03]: [https://cohere.com/blog/transcribe Cohere Transcribe]

==In Browser==
* [https://huggingface.co/spaces/Xenova/whisper-word-level-timestamps Whisper Timestamped]: Multilingual speech recognition with word-level timestamps, running locally in browser

==Phrase Endpointing and Voice Activity Detection (VAD)==
I.e. how to determine when user is done talking, and bot should respond?
* [https://x.com/kwindla/status/1831364419261268017 Notes]
** [https://demo.dailybots.ai/ Test settings]
** [https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/vad/vad_analyzer.py code]
** [https://github.com/snakers4/silero-vad Silero VAD repo]

==Audio Cleanup==
* [https://krisp.ai/ Krisp AI]: Noise cancellation, meeting summary, etc.

==Auto Video Transcription==
* [https://www.translate.mom/ TranslateMom]
* [https://github.com/abus-aikorea/voice-pro Voice-Pro]: YouTube downloader, speech separation, transcription, translation, TTS, and voice cloning toolkit for creators

=Text-to-speech (TTS)=
==Open Source==
* [https://github.com/huggingface/parler-tts Parler TTS] ([https://huggingface.co/spaces/parler-tts/parler_tts demo])
* [https://github.com/DigitalPhonetics/IMS-Toucan Toucan] ([https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTS demo])
* [https://tts.themetavoice.xyz/ MetaVoice] ([https://github.com/metavoiceio/metavoice-src github])
* [https://github.com/2noise/ChatTTS ChatTTS]
* [https://www.camb.ai/ Camb.ai] [https://github.com/Camb-ai/MARS5-TTS MARS5-TTS]
* [https://github.com/coqui-ai/TTS Coqui TTS Toolkit]
* Fish Speech 1.4: multi-lingual, can clone voices ([https://x.com/reach_vb/status/1833801060659372071 video], [https://huggingface.co/fishaudio/fish-speech-1.4 weights], [https://huggingface.co/spaces/fishaudio/fish-speech-1 demo])
* [https://huggingface.co/SWivid/F5-TTS F5-TTS] ([https://huggingface.co/spaces/mrfakename/E2-F5-TTS demo]): cloning, emotion, etc.
* [https://huggingface.co/amphion/MaskGCT MaskGCT] ([https://huggingface.co/spaces/amphion/maskgct demo])
* [https://arxiv.org/abs/2312.09911 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit] ([https://github.com/open-mmlab/Amphion code])
* [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
* [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
* Canopy [https://canopylabs.ai/releases/orpheus_can_speak_any_language Orpheus Multilingual]
* [https://narilabs.org/ Nari Labs] [https://github.com/nari-labs/dia Dia]
* [https://kyutai.org/ Kyutai] [https://kyutai.org/next/tts TTS] [https://unmute.sh/ Unmute]
* [https://github.com/resemble-ai/chatterbox Chatterbox TTS] ([https://huggingface.co/spaces/ResembleAI/Chatterbox try])
* [https://play.ai/ Play AI] [https://github.com/playht/PlayDiffusion PlayDiffusion] ([https://huggingface.co/spaces/PlayHT/PlayDiffusion demo], [https://x.com/_mfelfel/status/1929586464125239589 example])
* Mistral [https://mistral.ai/news/voxtral Voxtral]
* Kitten TTS ([https://github.com/KittenML/KittenTTS github], [https://huggingface.co/KittenML/kitten-tts-nano-0.1 hf]) 15M (fast, light-weight)
* Microsoft [https://microsoft.github.io/VibeVoice/ VibeVoice] 1.5B
* [https://x.com/hume_ai/status/2031401003078062578?s=20 2026-03]: Huma AI [https://huggingface.co/collections/HumeAI/tada TADA]
* [https://x.com/FishAudio/status/2031411140820152560?s=20 2026-03]: [https://huggingface.co/fishaudio/s2-pro Fish Audio S2]

==Cloud==
* [https://elevenlabs.io/ Elevenlabs] ($50/million characters)
** [https://elevenlabs.io/voice-isolator voice isolator]
* [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic]
* [https://neets.ai/ Neets AI] ($1/million characters)
* Hailuo AI T2A-01-HD ([https://www.hailuo.ai/audio try], [https://intl.minimaxi.com/document/platform%20introduction?key=66701c8e1d57f38758d58198 API])
* [https://www.hume.ai/ Hume] (can set emotion, give acting directions, etc.)

=Text-to-audio=
* 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
* 2025-03: [https://arxiv.org/abs/2503.10522 AudioX: Diffusion Transformer for Anything-to-Audio Generation]

=Vision=
* [https://github.com/google/langfun Langfun] library as a means of converting images into structured output.
* See also: [[AI_tools#Multimodal| Multimodal open-weights models]]

==Visual Models==
* [https://openai.com/index/clip/ CLIP]
* [https://arxiv.org/abs/2303.15343 Siglip]
* [https://github.com/roboflow/supervision Supervision]
* [https://arxiv.org/abs/2311.06242 Florence-2]
* Nvidia [https://github.com/NVlabs/MambaVision MambaVision]
* Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)

==Depth==
* 2024-06: [https://arxiv.org/abs/2406.09414 Depth Anything V2] ([https://github.com/DepthAnything/Depth-Anything-V2 code])

==Superresolution==
* 2025-03: [https://arxiv.org/abs/2311.17643 Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields] ([https://github.com/prs-eth/thera code], [https://huggingface.co/spaces/prs-eth/thera use])

==Related==
* 2019-11: [https://arxiv.org/abs/1911.11763 SuperGlue: Learning Feature Matching with Graph Neural Networks] ([https://huggingface.co/docs/transformers/main/en/model_doc/superglue hf])

=Embedding=
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
* [https://x.com/OfficialLoganK/status/2031411916489298156?s=20 2026-03]: [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/ Gemini Embedding 2]
* [https://x.com/mixedbreadai/status/2032127466081567106?s=20 2026-03]: [https://www.mixedbread.com/ Mixedbread] Wholembed v3

==Text Embedding==
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
* 2025-02: [https://huggingface.co/chandar-lab/NeoBERT NeoBERT] ([https://arxiv.org/abs/2502.19587 preprint])
* 2025-03: [https://developers.googleblog.com/en/gemini-embedding-text-model-now-available-gemini-api/ gemini-embedding-exp-03-07]

==Image Embedding==
* 2025-01: [https://arxiv.org/abs/2501.18593 Diffusion Autoencoders are Scalable Image Tokenizers] ([https://yinboc.github.io/dito/ project], [https://github.com/yinboc/dito code])

=Time Series=
* [https://github.com/TDAmeritrade/stumpy Stumpy]: Python library, uses near-match subsequences for similarity and forecasting
* [https://arxiv.org/abs/1912.09363 Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]
* [https://arxiv.org/abs/2209.00905 From latent dynamics to meaningful representations]
* [https://arxiv.org/abs/2209.10705 Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators]
* [https://arxiv.org/abs/2310.01728 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models]
* [https://arxiv.org/abs/2310.10688 A decoder-only foundation model for time-series forecasting]
* [https://arxiv.org/abs/2310.03589 TimeGPT-1]
* [https://arxiv.org/abs/2402.02592 Unified Training of Universal Time Series Forecasting Transformers]
* [https://arxiv.org/abs/2407.10240 xLSTMTime : Long-term Time Series Forecasting With xLSTM]
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
* 2026-02: Google [https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/ TimesFM]

==Control==
* [https://arxiv.org/abs/2402.15989 PIDformer: Transformer Meets Control Theory]

==Forecasting==
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
* [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]

==Anomaly Detection==
* 2024-10: [https://arxiv.org/abs/2410.05440 Can LLMs Understand Time Series Anomalies?] ([https://github.com/rose-stl-lab/anomllm code])

=Data=
* See also: [[Data_Extraction#Data_Scraping| Data Scraping]] and [[Data_Extraction#Document_Parsing| Document Parsing]]
==Vector Database==
===Open Source===
* [https://milvus.io/ milvus] (open source with paid cloud option)
* [https://qdrant.tech/ Qdrant] (open source with paid cloud option)
* [https://vespa.ai/ Vespa] (open source with paid cloud option)
* [https://www.trychroma.com/ chroma]
* [https://www.llamaindex.ai/ LlamaIndex]
* [https://github.com/asg017/sqlite-vec/tree/main sqlite-vec]

===Commercial cloud===
* [https://archive.pinecone.io/lp/vector-database/ pinecone]
* [https://weaviate.io/products weaviate]

===MySQL===
* MySQL does not traditionally have support, but:
** [https://planetscale.com/blog/planetscale-is-bringing-vector-search-and-storage-to-mysql PlanetScale] is working on it
** [https://github.com/stephenc222/mysql_vss mysql_vss] ([https://medium.com/@stephenc211/enhancing-mysql-searches-with-vector-embeddings-11f183932851 discussion])
** [https://www.pingcap.com/tidb-serverless/ tibd] ([https://www.pingcap.com/article/mysql-vector-search-powering-the-future-of-ai-applications/ discussion])

==Database with Search==
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])

=See Also=
* [[AI]]
** [[Data Extraction]]
** [[AI compute]]
* [[AI agents]]
* [[AI understanding]]
* [[Robots]]

Human brain

2026-03-26T16:28:52Z

KevinYager: /* Brain signal decoding */

=Why brain is as it is=
* 2025-06: [https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00319-X The metabolic costs of cognition]

=How Brain Works=
==Predictive Coding==
* 2005-04: [https://royalsocietypublishing.org/doi/10.1098/rstb.2005.1622?utm_source=chatgpt.com A theory of cortical responses]
* 2014-09: [https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2014.00666/full Visual mismatch negativity: a predictive coding view]
* 2015-01: [https://www.sciencedirect.com/science/article/pii/S089662731401099X Visual Areas Exert Feedforward and Feedback Influences through Distinct Frequency Channels]
* 2016-11: [https://www.sciencedirect.com/science/article/pii/S0896627316306997 Mismatch Receptive Fields in Mouse Visual Cortex]
* 2018-03: [https://www.nature.com/articles/s41598-018-21407-9 Frontal cortex function as derived from hierarchical predictive coding]
* 2024-02: [https://www.sciencedirect.com/science/article/pii/S0149763423004426 The empirical status of predictive coding and active inference]

=Understanding=
* [https://arxiv.org/abs/2501.02950 Key-value memory in the brain]
* [https://helper.ipam.ucla.edu/publications/mac2024/mac2024_20152.pdf The cost of brain state transitions]

==Brain mapping==
* 2024-05: [https://www.science.org/doi/10.1126/science.adk4858 A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution] ([https://www.nature.com/articles/d41586-024-01387-9#ref-CR1 media summary])
* 2024-10: [https://www.nature.com/articles/s41586-024-07558-y Neuronal wiring diagram of an adult brain] ([https://www.nytimes.com/2024/10/02/science/fruit-fly-brain-mapped.html media summary]); 140,000 neurons in fruit fly brain
* 2024-12: [https://e11.bio/news/roadmap A roadmap to scale connectomics to entire mammalian brains]
* 2025-04: [https://www.nature.com/articles/s41586-025-08840-3 Functional connectomics reveals general wiring rule in mouse visual cortex] ([https://www.nature.com/articles/d41586-025-01088-x?utm_source=x&utm_medium=social&utm_campaign=nature&linkId=13897098 media summary])
* 2025-08: [https://www.nature.com/articles/s41586-025-08985-1 Light-microscopy-based connectomic reconstruction of mammalian brain tissue] ([https://research.google/blog/a-new-light-on-neural-connections/ blog])

===Related===
* [https://v2.virtualflybrain.org 3D visualization of adult fruit fly brain]

==Brain signal decoding==
* 2022-11: [https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2.full.pdf High-resolution image reconstruction with latent diffusion models from human brain activity]
* 2023-08: [https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002176%20 Music can be reconstructed from human auditory cortex activity using nonlinear decoding models] (intracranial EEG)
* 2023-09: [https://arxiv.org/abs/2309.14030 DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation] (external EEG)
* 2023-09: [https://www.biorxiv.org/content/10.1101/2023.09.12.557460v1 BrainLM: A foundation model for brain activity recordings]
* 2023-10: [https://ai.meta.com/blog/brain-ai-image-decoding-meg-magnetoencephalography/ Toward a real-time decoding of images from brain activity] (MEG)
* 2024-06: [https://www.biorxiv.org/content/10.1101/2024.06.04.596589v1.full.pdf PAM: Predictive Attention Mechanism for Neural Decoding of Visual Perception]
* 2024-07: [https://arxiv.org/abs/2407.07595 Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data] (EEG)
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-01: [https://arxiv.org/abs/2501.15322v2 Scaling laws for decoding images from brain activity] (EEG)
* 2025-02: Meta: [https://ai.meta.com/research/publications/brain-to-text-decoding-a-non-invasive-approach-via-typing/ Brain-to-Text Decoding: A Non-invasive Approach via Typing]
* 2025-02: Meta: [https://ai.meta.com/research/publications/from-thought-to-action-how-a-hierarchy-of-neural-dynamics-supports-language-production/ From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
* 2025-03: [https://www.nature.com/articles/s41593-025-01905-6 A streaming brain-to-voice neuroprosthesis to restore naturalistic communication]
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-09: [https://arxiv.org/abs/2508.18226 Disentangling the Factors of Convergence between Brains and Computer Vision Models] (fMRI and MEG)

==Brain Signal Prediction==
* 20226-03: [https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/ A foundation model of vision, audition, and language for in-silico neuroscience]

==Whole Brain Emulation (WBE)==
* 2024-09: [https://www.nature.com/articles/s41586-024-07939-3 Connectome-constrained networks predict neural activity across the fly visual system]
* 2025-10: [https://arxiv.org/abs/2510.15745 State of Brain Emulation Report 2025]

=Computational Analysis=

==Computational power of human brain==
* 2020-09: Joe Carlsmith: [https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/ How Much Computational Power Does It Take to Match the Human Brain?]

==Comparison to computer==
* [https://arxiv.org/abs/2208.12032 How (and Why) to Think that the Brain is Literally a Computer]
* [https://www.nature.com/articles/s42256-024-00925-4 Contextual feature extraction hierarchies converge in large language models and the brain] ([https://techxplore.com/news/2024-12-llms-brain-advance.html LLMs are becoming more brain-like as they advance])

==Biological vs. artificial neuron==
* [https://www.sciencedirect.com/science/article/pii/S0896627321005018 Single cortical neurons as deep artificial neural networks]: Each biological neuron can be simulated using DNN of 5-8 layers
* [https://arxiv.org/abs/2305.12471 Mapping Biological Neuron Dynamics into an Interpretable Two-layer Artificial Neural Network]

==Data processing==
* [https://pmc.ncbi.nlm.nih.gov/articles/PMC1564115/ How Much the Eye Tells the Brain]
* [https://www.sciencedirect.com/science/article/pii/S1364661313001277 Representational geometry: integrating cognition, computation, and the brain]
* [https://www.nature.com/articles/s41586-024-07522-w Language is primarily a tool for communication rather than thought]
* [https://www.openread.academy/en/paper/reading?corpusId=513306465 The Unbearable Slowness of Being: Why do we live at 10 bits/s?] ([https://arxiv.org/abs/2408.10234 preprint])

==Extract manifold/geometry==
* [https://www.science.org/doi/10.1126/science.adk8261 Selection of experience for memory by hippocampal sharp wave ripples]

=Comparisons=
* 2023-08: [https://arxiv.org/abs/2308.08708 Consciousness in Artificial Intelligence: Insights from the Science of Consciousness]
* 2024-05: [https://arxiv.org/abs/2405.02325 Are Biological Systems More Intelligent Than Artificial Intelligence?]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
** 2022-03: [https://www.nature.com/articles/s41593-022-01026-4 Shared computational principles for language processing in humans and deep language models]
** 2024-03: [https://www.nature.com/articles/s41467-024-46631-y Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns]
** 2025-03: [https://www.nature.com/articles/s41562-025-02105-9 A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations]
* 2025-05: [https://ai.meta.com/research/publications/emergence-of-language-in-the-developing-brain/ Emergence of Language in the Developing Brain]

==Analogies==
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-12: [https://www.nature.com/articles/s41562-025-02359-3 Shared sensitivity to data distribution during learning in humans and transformer networks]
===Speed-accuracy trade-off vs. Inference-compute===
* 2007: [https://psycnet.apa.org/doi/10.1037/0096-3445.136.2.217 Focusing the spotlight: individual differences in visual attention control]
* 2014-07: [https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2014.00150/full The speed-accuracy tradeoff: history, physiology, methodology, and behavior]

=Simulate Brain=
* 2023-09: [https://spj.science.org/doi/10.34133/icomputing.0055 The Digital Twin Brain: A Bridge between Biological and Artificial Intelligence]
* 2024-12: [https://www.nature.com/articles/s43588-024-00731-3 Simulation and assimilation of the digital human brain] ([https://arxiv.org/abs/2211.15963 preprint], [https://github.com/DTB-consortium/Digital_twin_brain-open code])
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-08: [https://www.arxiv.org/abs/2507.22229 TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction]

==See Also==
* [[AI_and_Humans#Simulate_Humans|Simulate Humans (using LLM)]]

=Bio-brain Inspirations for AI=
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]

=Theories of Consciousness=
* [https://www.consciousnessatlas.com/ Consciousness Atlas]

=See Also=
* [[AI_and_Humans#Simulate_Humans|LLM Simulate Humans]]

AI video

2026-03-24T22:39:08Z

KevinYager: /* March 2026 */

Science Agents

2026-03-24T19:22:25Z

KevinYager: /* Specific */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Personalities==
* 2026-03: [https://github.com/msitarzewski/agency-agents The Agency: AI Specialists Ready to Transform Your Workflow]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI video

2026-03-24T19:12:11Z

KevinYager: /* March 2026 */

==Evolution of Capabilities==
===Early===
* November 2016: [https://arxiv.org/abs/1611.10314 Sync-Draw]
* April 2021: [https://arxiv.org/abs/2104.14806 GODIVA]
* October 2022: [https://makeavideo.studio/ Meta Make-a-video]
* October 2022: [https://imagen.research.google/video/ Google Imagen video]

===2023===
* April 2023: [https://www.youtube.com/watch?v=XQr4Xklqzw8 Will Smith eating spaghetti]
* April 2023: [https://x.com/nickfloats/status/1642899094808002564 Harry Potter by Balenciaga]
* April 2023: [https://x.com/mrjonfinger/status/1645953033636048896?cxt=HHwWgMDT7YfkzNctAAAA Runway Gen 2]
* April 2023: [https://research.nvidia.com/labs/toronto-ai/VideoLDM/ Nvidia latents]
* December 2023: [https://www.threads.net/@luokai/post/C0vvEnTP4Oj Fei-Fei Li]

===2024===
====Early 2024====
* January 2024: [https://sites.research.google/videopoet/ Google VideoPoet]
* January 2024: [https://lumiere-video.github.io/ Google Lumiere]
* February 2024: [https://openai.com/index/sora/ OpenAI Sora]
* April 2024: [https://www.maginative.com/article/china-unveils-vidu-a-powerful-text-to-video-generator/ Vidu]
* May 2024: [https://deepmind.google/technologies/veo/ Veo]
* May 2024: [https://kling.kuaishou.com/ Kling]
* June 2024: [https://lumalabs.ai/dream-machine Luma DreamMachine]
* June 2024: [https://runwayml.com/research/introducing-gen-3-alpha RunwayML Gen-3 Alpha]
* July 2024: Examples:
** [https://www.youtube.com/watch?v=F_WfIzYGlg4 Toys-R-Us Commercial made using Sora]
** [https://www.youtube.com/watch?v=CSfw_NjqQ2o Motorola commercial made using genAI]
* July 2024: [https://x.com/rowancheung/status/1813258518159585723 haiper.ai]
====August 2024====
* August 2024: [http://hotshot.co/ Hotshot] ([https://x.com/maxescu/status/1825459083635536081 examples], [https://x.com/EccentrismArt/status/1825550841534972027 more examples])
* August 2024: Luma Dream Machine [https://x.com/LumaLabsAI/status/1825639918539817101 v1.5]
* August 2024: Examples:
** [https://x.com/endlesstaverns/status/1811276904692887815 Runway Gen3 music video]
** [https://x.com/runwayml/status/1820806644806070583 Runway Gen3 for adding FX to live action] ([https://x.com/bryanf0x/status/1825529998201004137 another example])
** [https://www.youtube.com/watch?v=taaM0s1bq7Q Midjourney + Runway Gen3: Hey It’s Snowing]
** [https://x.com/Kyrannio/status/1821605619927019974 Flux/LoRA image] + Runway Gen3 [https://x.com/iamneubert/status/1821970292014768420 woman presenter]
** [https://x.com/CharaspowerAI/status/1825274421256356106 McDonald’s AI commercial]
** Sora used by [https://www.facebook.com/izanamiaiart/ Izanami AI Art] to create [https://x.com/kimmonismus/status/1824102316229759114 dreamlike video] and by [https://x.com/alexiaadana Alexia Adana] to create [https://x.com/basedjensen/status/1824386717123743940 sci-fi film concept]
====September 2024====
* September 2024: [https://hailuoai.com/video/ Hailuo Minimax] ([https://x.com/minchoi/status/1829995683124035766 examples])
* September 2024: Examples:
** [https://www.youtube.com/watch?v=XAs5KuhfE_s Space colonization]
** [https://x.com/venturetwins/status/1827772646295265699 Consistent characters]
** [https://x.com/thealexbanks/status/1829489392354050502 Sea monsters]
** [https://x.com/ai_for_success/status/1829539535132426286 Music video]
** [https://x.com/RyanMorrisonJer/status/1829074823521112544 Animated characters]
** [https://x.com/CharaspowerAI/status/1829916782452191674 AI influencer]
** [https://x.com/minchoi/status/1829293248197902802 Ten short examples]
** [https://x.com/WorldEverett/status/1830596701473615937 Seven examples]
** [https://x.com/EccentrismArt/status/1830654805515395583 Clip from horror film]
** [https://x.com/MatthieuGB/status/1722146578813645296 "Gone" featuring astronaut] and [https://x.com/MatthieuGB/status/1742949297337852270 something ethereal]
** [https://x.com/kimmonismus/status/1831256663644373449 Two dancers] (surprisingly good consistency despite movement)
** [https://x.com/8bit_e/status/1831344542487871953 Music video about flying]
** [https://www.youtube.com/watch?v=_XtS_4PzEyk The Paperclip Maximizer]
** [https://x.com/trbdrk/status/1831801373517869369 La Baie Aréa]
** [https://www.reddit.com/r/aivideo/comments/1f8xr0w/gisele_tong_to_dear_me/ "To Dear Me" by Gisele Tong] ([https://www.morningstar.com/news/business-wire/20240904521664/reply-ai-film-festival-announced-the-winners-of-the-first-international-festival-for-short-films-made-with-artificial-intelligence winner of AI shorts] film festival)
** [https://x.com/maxescu/status/1833476640438964281 Various scenes]
** [https://x.com/EHuanglu/status/1833522650846793970 Directing emotions]
* September 2024: Kling 1.5 ([https://x.com/Uncanny_Harry/status/1836531835280724459 examples], [https://x.com/minchoi/status/1836800551469654088 showing emotions])
* September 2024: Examples:
** Runway video-to-video to [https://x.com/jon_barron/status/1835695132697604236 restyle classic video games]
** [https://x.com/ai_for_success/status/1835319670917796117 Realistic presenter]
** [https://x.com/kimmonismus/status/1834530744175059302 Skateboarding] (demonstrates getting closer to meaningfully simulating motion/physics)
** [https://x.com/minchoi/status/1835378029092049325 Examples] of short clips with cinematic feel
** Short: [https://x.com/PJaccetturo/status/1835670655330869633 4 Minutes to Live]
** Short: [https://x.com/dreamingtulpa/status/1836121321526432231 Neon Nights] (Arcade)
** [https://www.youtube.com/watch?v=CcrGSA-kSrI Random Access Memories]: AI-generated, but then projected onto Kodak film stock. Gives the final output some of the dreamy analog quality we associate with nostalgic footage
** Sora used to make a sort of [https://x.com/niceaunties/status/1837271244774715505 weird dreamlike video]
====October 2024====
* October 2024: Pika v1.5, including Pikaffects (explode, melt, inflate, and cake-ify; examples: [https://x.com/justin_hart/status/1841144350572413259 1], [https://x.com/arthur_hyper88/status/1841156544538521646 2], [https://x.com/ytjessie_/status/1841168925301842263 3], [https://x.com/bilawalsidhu/status/1841195247184781420 4], [https://x.com/minchoi/status/1841189035454447636 5], [https://x.com/ytjessie_/status/1841209415514669501 6])
* October 2024: Examples:
** [https://x.com/HalimAlrasihi/status/1839310216602788103 AI avatar with good lip-sync]
** [https://www.youtube.com/watch?v=5NZubOOeeV0 Battalion]: 5 minute short about war
** Short film: [https://x.com/MatthieuGB/status/1841173724688536015 To Wonderland] ([https://x.com/MatthieuGB/status/1841174221550207437 credits])
** [https://x.com/OnwardsProject/status/1841508441241890975 9 to 5]: Created with Luma Dream Machine keyframes and camera features; music by Suno
* October 2024: [https://ai.meta.com/research/movie-gen/ Meta Movie Gen]
* October 2024: Examples:
** [https://x.com/CuriousRefuge/status/1844424871335592373 AI Avatar] (using [https://x.com/CuriousRefuge/status/1844424871335592373 HeyGen])
** [https://www.youtube.com/watch?v=isW1FLL0K3w Generic Movies]
** [https://arxiv.org/abs/2410.05954 Pyramid-flow] ([https://huggingface.co/rain1011/pyramid-flow-sd3 open source]) model: [https://x.com/_akhaliq/status/1844239643778351605 examples]
** [https://x.com/whrumorvid/status/1846209247467491604 Building the Pyramids]
** [https://x.com/maxescu/status/1844716998854349217 People showing realistic emotion] (using [https://hailuoai.video/ Hailuo AI])
** Keyframes and Luma AI to make novel [https://x.com/CoffeeVectors/status/1845188179332051005 speed-ramp motion]
* October 2024: [https://pollo.ai/ Pollo AI] platform offers selection among a diversity of video models
* October 2024: [https://www.genmo.ai/ Genmo] [https://x.com/genmoai/status/1848762405779574990 Mochi 1] (open source)
* October 2024: Examples:
** [https://x.com/AIatMeta/status/1849134463382680028 Meta Movie Gen examples]
** [https://x.com/PJaccetturo/status/1847732127598800960 Emotional range of Minimax]
** [https://x.com/dustinhollywood/status/1848757800807039299 Car commercial: Bear]
** [https://x.com/runwayml/status/1848785913918218517 Diner conversation]
** [https://x.com/Uncanny_Harry/status/1849275871716159989 Loved and Lost] (a meditation on grief)
====November 2024====
* November 2024: Examples:
** [https://x.com/blizaine/status/1852092147643699356 Pasta Doble]
** [https://x.com/kimmonismus/status/1852425015175626876 Bird protecting young]
** [https://x.com/runwayml/status/1852363190484537666 Camera moving around sushi]
** [https://x.com/StevieMac03/status/1851969120813629939 Various examples] of [https://hailuoai.video/ Hailuo AI]
** [https://x.com/kimmonismus/status/1853102779650252978 Trains]
** [https://www.youtube.com/watch?v=Fh-_g5vev0s Light of Imagination]
** [https://x.com/LinusEkenstam/status/1854087441122021814 Bringing historic images to life]
** [https://x.com/DeryaTR_/status/1855637066203218180 Plants dancing]
** [https://x.com/c_valenzuelab/status/1855078644042944574 Insect on tree]
** Trailers for [https://x.com/abandonedmovies/status/1827037378009296983 The Silmarillion] and [https://x.com/abandonedmovies/status/1846941183702110211 The Fall of Gondolin] (by [https://x.com/abandonedmovies Abandoned Films])
** [https://x.com/Diesol/status/1855475704470884427 Moody sci-fi]
** [https://x.com/runwayml/status/1857072173631885586 Migration] ([https://runwayml.com/customers/behind-the-scenes-of-migration-with-director-jeremy-higgins made by combining] Runway ML Gen3-Alpha and traditional animation)
** [https://x.com/AIandDesign/status/1856467856625676752 After the Winter] ([https://suno.com/song/0d6919de-d2bf-434b-8aa6-ede0fb0fde77 music] made using Suno v4)
** Horror: [https://www.reddit.com/r/aivideo/comments/1gnk27q/ridge_to_southwest/ Ridge to Southwest]
** [https://www.youtube.com/watch?v=ClStJZmIjBU The Gardener] (by [https://www.youtube.com/@MachineMythos Machine Mythos])
** [https://x.com/techhalla/status/1857462526859935813 Coca-Cola holiday ad] and [https://www.youtube.com/watch?v=THdoOgwqjBg parody thereof]
** [https://x.com/pzf_ai/status/1858312421510992111 A Dream Within A Dream] (by [https://x.com/pzf_ai PZF], selected for the Czech International AI Film Festival)
** [https://x.com/WorldEverett/status/1859273222597775843 Making Friends] (by [https://x.com/WorldEverett Everett World]; see also [https://x.com/WorldEverett/status/1858563716834275562 Childhood Dream] and [https://x.com/WorldEverett/status/1858945634067202429 City Echoes])
** Anime: [https://x.com/naegiko/status/1857754626742726893 test shots], [https://x.com/naegiko/status/1858978557424210401 Ultimate Ceremony], [https://x.com/naegiko/status/1835434668294074462 Echoes of Love]
** [https://x.com/KakuDrop/status/1866309309384323257 Echoes of Grace] ([https://x.com/KakuDrop KakuDrop] using Sora)
** [https://x.com/vibeke_udart/status/1859879367071203662 Morphing hands], [https://x.com/vibeke_udart/status/1858772719224975630 hands and faces] ([https://x.com/vibeke_udart Vibeke Bertelsen])
** [https://www.reddit.com/r/aivideo/comments/1gxi29x/dbzlicious/ Dragon Ball Z live action]
** [https://x.com/cfryant/status/1860727980353278386 Pitch Black] (abstract and dark)
** [https://x.com/cfryant/status/1861050528932765710 Animals Running] (zoomed-in ultra-wide camera)
** [https://x.com/WorldEverett/status/1860730214487118290 Dreams of Tomorrow] (panning shots of high-tech car, Scottish manor)
** [https://x.com/nickfloats/status/1861206978690691165 Desert Planet Cinematics]
* November 2024: [https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora Leaked] Sora turbo model; [https://x.com/rowancheung/status/1861455031603503234 examples], [https://x.com/chatgpt21/status/1861504511153451517 Dog chasing Cat in snow]
====December 2024====
* December 2024: Examples:
** [https://x.com/minchoi/status/1863243880553976235 Realistic] (Minimax by Hailuo AI)
** Trailer for [https://x.com/TheReelRobot/status/1861824847149670840 Paradise Lost] (to be released on [https://www.sandwatch.ai/ Sandwatch AI])
** [https://x.com/EHuanglu/status/1863607136271716418 Music video example] with consistent characters
** [https://x.com/venturetwins/status/1863666366764687581 Human expressions] ([https://www.reddit.com/r/ChatGPT/comments/1h4r13x/ai_generated_expressions/ u/Kind_Distance9504 on Reddit], using Hailuo)
** Vodafone ad: [https://www.youtube.com/watch?v=9AyEC_K9kBg The Rhythm Of Life]
** [https://www.reddit.com/r/midjourney/comments/1h5u2gw/we_made_a_10_minute_gen_ai_batman_film/ 10 minute Batman film]
* December 2024: Tencent [https://aivideo.hunyuan.tencent.com/ Hunyuan Video] open-source video model ([https://x.com/CharaspowerAI/status/1863862585554010530 example])
* December 2024: [https://sora.com/ Sora] release ([https://x.com/CharaspowerAI/status/1866203050982916532 examples])
* December 2024: [https://mint-video.github.io/ MinT video] improves consistency and control ([https://arxiv.org/abs/2412.05263 preprint], [https://x.com/EHuanglu/status/1868278456565531061 examples])
* December 2024: Google [https://blog.google/technology/google-labs/video-image-generation-update-december-2024/ Veo 2] ([https://x.com/sundarpichai/status/1868709099644334518 examples], [https://x.com/EHuanglu/status/1869008306322522342 more examples], [https://x.com/_Borriss_/status/1869267571532320966 natural movement examples], [https://x.com/jerrod_lew/status/1870816560027246715 abstract], [https://x.com/jerrod_lew/status/1869427407415058660 realistic physics], [https://x.com/jerrod_lew/status/1873096585002786944 crowds], [https://x.com/minchoi/status/1873590350515929380 dancing], [https://x.com/jerrod_lew/status/1874440442269565351 animals])
* December 2024: [https://x.com/pika_labs/status/1867651381840040304 Pika 2.0] with Scene Ingredients
* December 2024: Examples:
** [https://www.youtube.com/watch?v=c_kKKRQ5gYw Synthetic Youth: Takenoko Zoku · Made by Emi Kusano with Sora]
** [https://x.com/higgsfield_ai/status/1868698886761837041 Car race] ([https://higgsfield.ai/ Higgsfield AI] storytelling)
** [https://x.com/blizaine/status/1868850653759783033 Slicing meat]; comparison of modern video generators
** Challenging prompt: [https://x.com/RubenEVillegas/status/1868864410720325844 A cat roars while looking at its reflection in the mirror but instead sees itself as a lion roaring (Veo 2)] ([https://x.com/anukaakash/status/1869417975071330550 comparison to other models])
** [https://x.com/PJaccetturo/status/1869829338868412865 Anime trailer]
** [https://x.com/ring_hyacinth/status/1870386506776674376 Snorlax at Mount Fuji] and [https://x.com/ring_hyacinth/status/1871105733443592696 Psyduck at Colosseum] (Kling 1.6)
** [https://x.com/machine_mythos/status/1870565287789056320 Horror visuals] (with [https://mmaudio.net/ MMAudio] sound)
** [https://www.youtube.com/watch?v=lFc1jxLHhyM The Heist] (Veo 2)
** [https://x.com/minchoi/status/1871263616806129863 Various Veo 2 examples]
** [https://x.com/minchoi/status/1872390429108486320 Live Action Titans]
** [https://x.com/kimmonismus/status/1873094065841193222 Cats] [https://x.com/PostsOfCats/status/1872530207585825058 Cooking]
** Aesthetic from alternate timelines: [https://x.com/BrianRoemmele/status/1871753358782120068 1], [https://x.com/BrianRoemmele/status/1872105833456423216 2], [https://x.com/brain_racked/status/1872340717978390583 3]
** [https://x.com/minchoi/status/1872486717145706793 Examples approaching cinematic quality]
** [https://x.com/JaicSam/status/1872903054221033693 Cosmic Spider] (winner at AI film festival)
** [https://www.youtube.com/watch?v=dbdYPMRi_Nk Trailer for Newton's Cradle] (full film [https://x.com/JeffSynthesized/status/1872705173451358293 on] 2025-01-01)
** [https://x.com/Ror_Fly/status/1873036384077828499 Car vs. Jet drag race]
** [https://x.com/Diesol/status/1873415500149199066 California Monsters]
** [https://x.com/heyshrutimishra/status/1873631383584924078 Various examples] (Hailuo AI)
** [https://x.com/kimmonismus/status/1873568693357294014 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023])
** [https://x.com/StevieMac03/status/1873998177193648438 Sorceress and Arachnid Steed] (Kling v1.6)
** [https://x.com/dustinhollywood/status/1873940924016779425 Music video] (Hailuo AI)
** [https://www.youtube.com/watch?v=iQg2udCHMdI Akụkọ (Story)] (22 minute short) - A Lagos Boy's Thrilling Snack Run Nightmare
** [https://x.com/cinerobot/status/1873766976306455019 Son of the Dragon] (8 minute short)
** [https://x.com/SynthReveries/status/1873624586857886071 Endless Journey] music video ([https://suno.com/song/fa90fa5e-25c7-48ad-b291-42a8a8c51cf9 music] by Suno)
** [https://x.com/anukaakash/status/1870504167653228980 Once Again] (retrospective)
** [https://x.com/jasonzada/status/1873470586053414928 Fade Out] (Veo 2)
** [https://x.com/talkboysstudio/status/1869085014513865027 Roadkill] (12 minute short)

===2025===
====January 2025====
* January 2025: [https://x.com/kimmonismus/status/1877351050748871038 Progress] over the last 1.5 years, by comparing Runway Gen 2 and Veo 2.
* January 2025: Examples:
** [https://x.com/AllarHaltsonen/status/1874557865576542655 Delivery] (unofficial Nike ad)
** [https://x.com/Diesol/status/1875237221735002299 Gucci ad] (Sora)
** [https://x.com/EccentrismArt/status/1874498145910149412 Conquest]
** [https://www.youtube.com/watch?v=RJZCMfaS-io Newton's Cradle] (6 minute short)
** [https://x.com/AngryTomtweets/status/1874627041934602410 Singer]
** [https://x.com/DumpsterBud/status/1874807352794182019 Brain vomit] (music video)
** [https://x.com/mxvdxn/status/1874796628210778618 Vibe] (Kling v1.6)
** [https://x.com/_deepfates/status/1875215969452523785 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024])
** [https://www.youtube.com/watch?v=BL9-jHGnxyc Zorgop Knows All] (2 minute short)
** [https://x.com/ButchersBrain/status/1875130428518269406 The Breach] (2 minute short; Veo2, Runway ActOne, MMaudio)
** [https://x.com/Rainmaker1973c/status/1875261591043850477 Aesthetics from an alternate timeline]
** [https://x.com/StevieMac03/status/1875440611849072841 Immortal Awakens]
** [https://x.com/isaachorror/status/1875624519588835400 The Faded Line]
** [https://www.youtube.com/watch?v=4fy8H38rm-4 Dear Dad]
** [https://x.com/maxescu/status/1877060580680311242 Mad Max chase]
** [https://x.com/kimmonismus/status/1877408247906447633 Patience is Key]
** [https://x.com/techhalla/status/1879967230093586555 The Almost Famous Show] (talent show parody)
** [https://x.com/thefuzzysignal/status/1879295176990154755 Proof-of-concept trailer for a medieval adult animated series]
** [https://x.com/JeffSynthesized/status/1879555151499034869 Variety] (unofficial Cadbury ad)
** [https://x.com/henrydaubrez/status/1879883806947115446 Kitsune] (5 minute animated short, Veo 2)
* January 2025: MiniMax Hailuo [https://www.minimaxi.com/en/news/s2v-01-release Subject Reference] enables consistent characters ([https://x.com/minchoi/status/1881707687362412924 examples])
* January 2025: AI (de-aging deepfakes, [https://magnific.ai/ Magnific]) [https://x.com/JeffSynthesized/status/1878630652377178502 used in the film] [https://www.imdb.com/title/tt18272208/ "Here"].
* January 2025: Luma [https://lumalabs.ai/ray Ray2]
* January 2025: [https://pikartai.com/pika-2-1/ Pika 2.1] ([https://x.com/OrctonAI/status/1883925754653905049 examples])
* January 2025: Examples:
** [https://x.com/wyzborrero/status/1879949477764804873 Light projections onto people] (challenging task, Ray2)
** [https://x.com/AllarHaltsonen/status/1881261042753589547 BMW ad]
** [https://x.com/AIWarper/status/1880658326645878821 John Wick in Severance] (Hunyuan vid2vid)
** [https://x.com/TheReelRobot/status/1881771800595444193 Biopic] (7 minutes)
** [https://x.com/misslaidlaw/status/1882180619582791784 Give It To Me] (music video)
** [https://x.com/paultrillo/status/1882091702506459394 Where do we go from here?] (music video, Veo 2)
** [https://x.com/WorldEverett/status/1882235057076580502 Party like there's no tomorrow] (music video)
** [https://x.com/Diesol/status/1884696027942498779 S.T.O.R.I.] (Midjourney and Pika 2.1)
====February 2025====
* February 2025: Examples:
** [https://x.com/OrctonAI/status/1885839287913955597 Long Steampunk scene]
** [https://x.com/jerrod_lew/status/1885787580685562226 City destruction]
** [https://x.com/EHuanglu/status/1885736840344551763 Consistent character acting]
** [https://x.com/MeanOrangeCat/status/1884295241534185890 Kaiju Katastrophe] (by [https://x.com/MeanOrangeCat Mean Orange Cat])
** [https://x.com/Diesol/status/1886433799690748210 The Greyhound]
** [https://x.com/CoffeeVectors/status/1886146242029195391 Fluid simulation video2video]
** [https://x.com/toolstelegraph/status/1886622772828254403 High resolution macro shots]
** [https://www.youtube.com/watch?v=p0J1LDWERS0 Chrysalids]
** [https://x.com/multimodalart/status/1887817996220940737 Boring realistic images] (HunyuanVideo w/ LoRA)
** [https://www.youtube.com/watch?v=PcVRfa1JyyQ Anime intro] ([https://www.reddit.com/r/StableDiffusion/comments/1ijvua0/opensource_almostconsistent_real_anime_made_with/ Hunyuan w/ custom LoRAs])
** [https://x.com/AllarHaltsonen/status/1888294811750318114 Automotive ad test] (Kling w/ custom model)
** [https://x.com/AngryTomtweets/status/1888758524303269928 Random cinematic clips] (Midjourney and Kling)
** [https://x.com/juliewdesign_/status/1888666757302263828 Crossing Paths]
** [https://x.com/weirdai_art/status/1888794894187041200 Miniature food]
** [https://x.com/CaptainHaHaa/status/1889573017745035463 Animals]
** [https://x.com/Kavanthekid/status/1889371011667144724 Star Wars - The Ghost's Apprentice (Fan Film)]
** [https://x.com/AngryTomtweets/status/1889768184716423573 Ray2 image-to-video examples]
** [https://x.com/weirdai_art/status/1889890470987518069 New Horizons] (miniatures going to Mars)
** [https://x.com/karim_yourself/status/1890100168378536155 Black Sun (trailer)]
** [https://x.com/BrivaelLp/status/1890122101153231288 AI avatars] ([https://www.argil.ai/ Argil AI])
** [https://x.com/mrjonfinger/status/1890783411679236473 Adding elements to real video] ([https://x.com/mrjonfinger/status/1891337081923772918 other example])
** [https://x.com/SynthReveries/status/1892278954137940289 Glitch]
** Anime: [https://x.com/freeeebird2300/status/1889119007707689146 sci-fi] (Ray2), [https://x.com/Artedeingenio/status/1891173784188756069 sci-fi] (Ray 2), [https://x.com/seiiiiiiiiiiru/status/1890980673743474931 90s sci-fi] (Luma) and [https://x.com/TomLikesRobots/status/1891209369804591447 moody] (Midjourney and Ray2)
* February 2025: Meta [https://hila-chefer.github.io/videojam-paper.github.io/ VideoJAM]
* February 2025: ByteDance [https://omnihuman-lab.github.io/ OmniHuman-1]
* February 2025: ByteDance [https://saiyan-world.github.io/goku/ Goku] ([https://arxiv.org/abs/2502.04896 paper], [https://x.com/ai_for_success/status/1888821141495844991 examples])
* February 2025: [https://huggingface.co/stepfun-ai/stepvideo-t2v Step-Video-T2V] open-source model ([https://arxiv.org/abs/2502.10248 paper], [https://github.com/stepfun-ai/Step-Video-T2V code], [https://yuewen.cn/videos demo], [https://x.com/ai_for_success/status/1891369136082854129 examples])
* February 2025: Pika [https://x.com/pika_labs/status/1892620122818294109 Pikaswaps] (examples of [https://x.com/FreddyChavezO/status/1892678426487881805 modifying regions], [https://x.com/CharaspowerAI/status/1893216710141919637 swapping items])
* February 2025: Alibaba [https://wanai.pro/ Wan 2.1] [https://huggingface.co/blog/LLMhacker/wanai-wan21 open-source] ([https://x.com/fofrAI/status/1894862403260596371 examples])
* February 2025: [https://thetwinai.com/ Twin AI]: compose videos with provided character, object, location ([https://x.com/EHuanglu/status/1901277394729930984 example])
* February 2025: Examples:
** [https://x.com/mrjonfinger/status/1893109598627750164 Infected] (Pika swaps and additions)
** [https://x.com/amli_art/status/1893447314913796253 Hostile Government Takeover] (Veo2)
** [https://x.com/KarolineGeorges/status/1895226395812561399 Dual Mechanism] (Pikaframes 2.2)

====March 2025====
* March 2025: Examples:
** [https://x.com/SynthReveries/status/1895826068617252901 Doors] (music video)
** [https://x.com/bind_lux/status/1894492032414224792 Drum and Bass] (music video; Kling, audio from [https://www.riffusion.com/?filter=staff-picks Riffusion])
** [https://x.com/RileyRalmuto/status/1896088776151269523 Woman's face] (Sora)
** [https://x.com/ryanwpatterson/status/1896968881731948844 Skating] (Ray2)
** [https://www.threads.net/@evolving.ai/post/DGlRyRoO7c9?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Filming commercial on Mars]
** [https://www.threads.net/@evolving.ai/post/DGycqyhuETS?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Original Source commercial] (AI and real footage)
** [https://x.com/maxescu/status/1896926229204496788 Time-lapses] (Pika 2.2)
** [https://www.youtube.com/watch?v=2RhkcJyhg0E Hallucination]
** [https://x.com/town_in_new/status/1897354572139782620 Macro video of bubbles]
* March 2025: [https://github.com/Tencent/HunyuanVideo-I2V HunyuanVideo-I2V] image-to-video
* March 2025: Google [https://x.com/labsdotgoogle/status/1897376700666626233 Whisk Animate] (based on Veo2, [https://x.com/maxescu/status/1902742535618888025 examples])
* March 2025: Examples:
** [https://x.com/jdp2oo/status/1897874927367160114 Recursion (horror)] (Kling)
** [https://x.com/blizaine/status/1897826177970028614 Will Smith Eating Spaghetti while Sitting Inside a Bag] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025])
** [https://x.com/mickmumpitz/status/1897979382687297697 Paper Jam] (Kling with custom workflows to enable precise control)
** [https://x.com/maxescu/status/1899155936645722216 Cinematic shots] (Google Whisk and Luma)
** [https://x.com/weirdai_art/status/1899631013002711409 Perfunctory Horizons]
** [https://x.com/maxescu/status/1900243840499368319 A Hard Winter]
** [https://x.com/RoyalKongz/status/1900315389139014074 Consistent character example]
** [https://x.com/maxescu/status/1900652266362650853 Anthropomorphic Animals]
** [https://x.com/kimmonismus/status/1900457543299727718 Realistic (influencer-style)]
** [https://x.com/SunoMusic/status/1900942410584043579 I Feel Cultured] (music video with surrealist vibes)
** [https://rodeo.club/post/0x30b45c56d62751D763D3B8bFe4D18c4BB65EDF2c/209 journey of utmost importance]
** [https://x.com/aiordieshow/status/1901930851127984291 Karen: Unleashed]
** [https://x.com/minchoi/status/1901783767364092232 Yarn Cat]
** [https://x.com/andyorsow/status/1901619535180091509 Ned's Wet Deli] (Runway)
** [https://www.youtube.com/watch?v=KVoiooE8C0c BOOTS], a.k.a. [https://x.com/RuairiRobinson/status/1902027217137484117 "Our enemies are cartoon monsters"] (music video based on poem by Rudyard Kipling; Veo2)
** Flying in a dream: [https://x.com/minchoi/status/1902197944826183864 1], [https://x.com/venturetwins/status/1901796679063626060 2]
** [https://x.com/jasonzada/status/1902129567659389443 Commercial for Mercedes-Benz and FYI Radio]
** [https://x.com/maxescu/status/1903108496666542562 Selfie video] (Luma)
** Podcasts: [https://www.reddit.com/r/singularity/comments/1jintit/rottenly_roasted_now_full_script_is_also_not/ Rottenly Roasted] and [https://www.reddit.com/r/aivideo/comments/1jerh56/worst_date_ever/ Worst Date Ever] [https://x.com/OriZilbershtein/status/1903503438744318002 (Imagen 3, Hedra, Elevenlabs, Topaz)]
** [https://x.com/DexploreArts/status/1903822122150986000 Ambience] (Midjourney, Luma)
** [https://x.com/TheoMediaAI/status/1904207679511572845 The Bridge] (2 minute short; Veo2)
** [https://x.com/peteromallet/status/1904268944992829462 Pulp Fiction] (Wan video editing)
** [https://x.com/madpencil_/status/1906765750624493650 Camera Controls] (Luma Ray2)
* March 2025: [https://www.hedra.com/ Hedra] [https://x.com/hedra_labs/status/1897699010632466469 Character 3]
* March 2025: [https://huggingface.co/hpcai-tech/Open-Sora-v2 Open Sora v2] ([https://github.com/hpcaitech/Open-Sora code])
* March 2025: Amazon Prime debuts [https://en.wikipedia.org/wiki/House_of_David_(TV_series) House of David], with special effects created by [https://www.thewonderproject.com/ Wonder Project] using a [https://x.com/PJaccetturo/status/1903126616831676792 combination of traditional and AI methods] (reportedly including Midjourney and Runway)
* March 2025: Examples:
** [https://x.com/PJaccetturo/status/1905151190872309907 What if Studio Ghibli directed Lord of the Rings?] (OpenAI GPT-4o in-context image generation, Kling)
** [https://x.com/ROHKI/status/1906039022662963269 RŌHKI]
** [https://x.com/iaveras/status/1906362437487534296 Why]
** [https://x.com/BrianRoemmele/status/1906476721236570508 Commercial for Puma] (research/test)
** [https://x.com/Salmaaboukarr/status/1906776503343325469 Commercial for KFC] (concept ad)
* March 2025: Runway ML [https://runwayml.com/research/introducing-runway-gen-4 Gen-4]
** [https://www.youtube.com/watch?v=c8IBmK7GZP8 The Lonely Little Flame]
** [https://www.youtube.com/watch?v=Z0P6qjMUl34&t=1s The Herd]
** [https://www.youtube.com/watch?v=9HzdNhOe09I The Retrieval]
** [https://www.youtube.com/watch?v=xEhgxhrAjE4 NYC is a Zoo]
** [https://www.youtube.com/watch?v=ENGKp5wn344 Scimmia Vede] (music video)
** More examples: [https://x.com/techhalla/status/1906807994009993473 various], [https://x.com/c_valenzuelab/status/1907958530369372541 art direction], [https://x.com/c_valenzuelab/status/1908146364741029998 mannequins], [https://x.com/c_valenzuelab/status/1907921566643732612 taxi], [https://x.com/c_valenzuelab/status/1907432109695717798 small things], [https://x.com/c_valenzuelab/status/1907563448902496362 long shot (1m)]

====April 2025====
* April 2025: Examples:
** [https://x.com/AzeAlter/status/1906974768705990794 Age of Beyond]
** [https://x.com/techhalla/status/1907790675057242319 Commercial for Coca-Cola] (Higgsfield)
** [https://www.reddit.com/r/StableDiffusion/comments/1jr6j11/comment/mle9bq5/?context=3 Anime scene (3m)] (Wan 2.1 with LoRa)
** [https://x.com/pika_labs/status/1908263310912610401 Taxes then Death] (Pika multikeyframe)
* April 2025: [https://www.krea.ai/ Krea] [https://x.com/krea_ai/status/1907829389452021853 Video Re-Style]
* April 2025: ByteDance [https://grisoon.github.io/DreamActor-M1/ DreamActor-M1] performance transfer
* April 2025: Examples:
** [https://x.com/Diesol/status/1908535493673050403 Mercs] (Midjourney v7, Ray2)
** [https://x.com/minchoi/status/1909078846126649440 Cat at theme park]
** [https://x.com/c_valenzuelab/status/1909630883218207036 Timelapse history] (Runway Gen4)
** [https://x.com/EHuanglu/status/1909660808973533225 Examples for use in advertising]
** [https://x.com/arohaAIX/status/1910688361221599361 Sci-fi scapes]
** [https://x.com/PJaccetturo/status/1910750148055146708 Avα]
** [https://x.com/imagineFERA/status/1910601934207152576 The Bureau]
** [https://x.com/jasonzada/status/1911812014059733041 Beaver and Sock (3m)]
** [https://x.com/Delachica_/status/1911842237622735052 Organic Waste (5m)] (Runway)
** [https://x.com/c_valenzuelab/status/1912260798270882104 Fly] (Runway Gen4)
* April 2025: Alibaba [https://arxiv.org/abs/2504.04842 FantasyTalking] lipsync ([https://arxiv.org/abs/2504.04842 paper], [https://x.com/EHuanglu/status/1910341110322577442 examples])
* April 2025: Tencent Hunyuan [https://arxiv.org/abs/2411.16331 Sonic] image animation/lipsync to audio ([https://x.com/ai_for_success/status/1911719866958286864 examples])
* April 2025: ByteDance [https://huggingface.co/papers/2504.08685 Seaweed-7B] ([https://arxiv.org/abs/2504.08685 preprint], [https://www.youtube.com/watch?v=OaPI6K2y3rI examples])
* April 2025: [https://app.klingai.com/global/release-notes Kling 2.0] ([https://www.youtube.com/watch?v=Yqvh3M12T_M video])
* April 2025: [https://www.skyreels.ai/home Skyworks] [https://github.com/SkyworkAI/SkyReels-V2 SkyReels V2] (open-source, unlimited extension; [https://x.com/AngryTomtweets/status/1914270477482443142 examples])
* April 2025: [https://sand.ai/ Sand AI] [https://huggingface.co/sand-ai/MAGI-1 Magi-1] (open source, unlimited extension; [https://x.com/AngryTomtweets/status/1914318743578296506 examples], [https://x.com/dreamingtulpa/status/1916035289300275372 more examples])
* April 2025: Examples:
** [https://x.com/maxescu/status/1912100029549994016 Mars 2035 (3m)] (Kling 2.0)
** [https://x.com/ai_for_success/status/1912466999147450600 Kingdom (dragon battle, 3m)]
** [https://x.com/imagineFERA/status/1913156296657756278 Reflection (3m)] (Gen4)
** [https://x.com/Wytsekoetse/status/1913547157493162035 Pizza Galaxy (1m)] (MJ and Gen4)
** [https://www.youtube.com/watch?v=rseqmSGH7xk Snoop Dogg music video: Last Dance with Mary Jane] (blend of traditional and AI effects)
** [https://x.com/dreamingtulpa/status/1915104310448501129 Realistic human motion]
** [https://x.com/KarolineGeorges/status/1915113151546396893 Inception loop] (Gen4)
** [https://x.com/rayisdoingfilm/status/1916468807435952330 Tuesday (1m)] (Gen4)
** [https://www.youtube.com/watch?v=XWdwF1q3kDw Deus in Machina Automata (4m)] (Gen4)
** [https://x.com/machina9000/status/1915090908850049223 Outsiders (3m music video)]

====May 2025====
* May 2025: [https://huggingface.co/Lightricks/LTX-Video LTX-Video 13B] ([https://github.com/Lightricks/LTX-Video code], [https://x.com/maxescu/status/1919801813987164527 examples], [https://x.com/cubiq/status/1919748210567815551 more examples])
* May 2025: HeyGen Avatar IV (examples: [https://x.com/StevieMac03/status/1919910677860216869 sci-fi], [https://x.com/KarolineGeorges/status/1919801983143211222 Come Closer], [https://x.com/maxescu/status/1920410329454100973 singing], [https://x.com/minchoi/status/1920853859171234165 various])
* May 2025: Tencent [https://hunyuancustom.github.io/ HunyuanCustom]
* May 2025: Examples:
** [https://x.com/lifeofc/status/1920331476157280413 Iris (1.5m)] (Midjourney, Luma, Runway)
** [https://runwayml.com/customers/the-making-of-mars-and-siv Mars and Siv: "No Vacancy" (episode 1, 6m)] (Runway)
** [https://x.com/cfryant/status/1921317318744760817 Go to the East Wing] (dreamlike, Luma)
** [https://x.com/DeryaTR_/status/1921015340827304389 Yu Lanter showreel] (Higgsfield)
** [https://x.com/freeeebird2300/status/1921789387614134652 Cyberpunk anime] (Luma)
** [https://x.com/LittleTinRobot/status/1921692735930589246 Alien animals] (Runway)
** [https://x.com/minchoi/status/1922500563792486878 America's Funniest AI Home Videos (3m)]
** [https://x.com/c_valenzuelab/status/1924204409833103365 Editing POV shots from AR glasses] (Runway)
* May 2025: [https://runwayml.com/gen48 Gen:48] Fourth Edition winners:
** [https://www.youtube.com/watch?v=NphCYRXjqTI&t=174s Home] (3m)
** [https://www.youtube.com/watch?v=L2DQwCp_DCw The King's Secret] (2m)
* May 2025: [https://viggle.ai/home Viggle] Live [https://x.com/ViggleAI/status/1926324953038627214 enables] real-time avatar control
* May 2025: Google [https://blog.google/technology/ai/generative-media-models-io-2025/ Veo 3] (examples: [https://x.com/babaeizadeh/status/1924942128851124284 conversation], [https://x.com/mattshumer_/status/1925039973310308424 cooking], [https://x.com/jerrod_lew/status/1924934440486371589 singing], [https://x.com/MartinNebelong/status/1924926779677905014 simple story], [https://x.com/Diesol/status/1925114473544913004 cinematic action sequence], [https://x.com/laszlogaal_/status/1925094336200573225 car show interviews], [https://x.com/arikuschnir/status/1924953349943697763 We Can Talk], [https://x.com/venturetwins/status/1925021235530105298 podcat], [https://x.com/maxescu/status/1925079990061957423 various], [https://x.com/jerrod_lew/status/1927092379892265139 camera moves])
* May 2025: Examples:
** [https://x.com/javilopen/status/1925495026903380358 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025])
** [https://x.com/MetaPuppet/status/1926659557914268155 Bob from Marketing] (Veo 3)
** [https://x.com/dustinhollywood/status/1926733069475565622 He is King (16m)]
** [https://x.com/HashemGhaili/status/1925616536791760987 Prompt Theory], [https://x.com/HashemGhaili/status/1925332319604257203 part 2], [https://x.com/HashemGhaili/status/1927467022213869975 Afterlife (3m)] (Veo3)
** [https://x.com/JoannaStern/status/1927856754873835747 My Robot and Me (3m)] (Veo, Runway)
** [https://x.com/rohanpaul_ai/status/1928152398930817238 The Internet's Over] (Veo3)
** [https://www.reddit.com/r/aivideo/comments/1l0rl7d/before_colours_fade/ Before Colours Fade (2m)] (Midjourney, Kling)

====June 2025====
* June 2025: Examples:
** [https://x.com/amasad/status/1930505292904837132 Bigfoot ASMR]
** [https://x.com/minchoi/status/1930670583605514333 Talking] (HeyGen Avatar IV upgrade)
** [https://x.com/ROHKI/status/1931081752992477285 Where are all the aliens? (2m)]
** [https://x.com/fofrAI/status/1930999540770893874 Natural talking]
** [https://x.com/ammaar/status/1931672722418851904 Elemental Showdown - Mortal Kombat (3m)]
** [https://x.com/maxjoseph/status/1932104616021565476 It Starts at the End (music video, 4m)]
** [https://x.com/deedydas/status/1932105266654581116 Sci-fi trailer (2m)]
** [https://x.com/DrMachakil/status/1931816470901575924 The Prompt Floor (2m)]
** [https://x.com/DrMachakil/status/1853960062546366856 NALVORA (2.7m)] - [https://x.com/DrMachakil/status/1932904599004066200 Best Trailer, Metamorph AI Film Awards]
** [https://x.com/Kalshi/status/1932891608388681791 Commercial for Kalshi (30s)] - [https://x.com/PJaccetturo/status/1932893260399456513 to air during NBA finals] (Veo)
** [https://x.com/ROHKI/status/1933594430113788227 Your Brain is Broken on Purpose (2m)]
** [https://x.com/c_valenzuelab/status/1934312626021949687 Runway Gen-4 Reference examples]
** [https://x.com/JesusPlazaX/status/1934253813696786661 Paper airplane]
** [https://x.com/minchoi/status/1934032730947526872 Veo3 examples]
** [https://x.com/NomadsVagabonds/status/1935329331410075734 Reset 3 (1m, surreal)]
** [https://x.com/HashemGhaili/status/1935722105322323968 It Has No Soul (1m, Veo3)]
* June 2025: [https://seedance.net/seedance Seedance 1.0] ([https://arxiv.org/abs/2506.09113 preprint])
* June 2025: [https://hailuoai.video/ Hailuo AI] (MiniMax) Hailuo 02 ([https://x.com/venturetwins/status/1934236631336403344 "Kangaroo" during testing]; examples: [https://x.com/lepadphone/status/1935078910934626429 various], [https://x.com/alexgnewmedia/status/1935018186954719365 various], [https://x.com/FussyPastor/status/1935065068456263883 tsunami], [https://x.com/thedorbrothers/status/1935098802744213935 fight scene], [https://x.com/umesh_ai/status/1935028257708966231 fox running], [https://x.com/BrentLynch/status/1934979825636446268 blogger], [https://x.com/HalimAlrasihi/status/1935297126759538735 transitions], [https://x.com/MKMXLA/status/1938318951664280045 skateboarding])
* June 2025: Midjourney video ([https://x.com/minchoi/status/1934373051464057062 early examples], [https://x.com/ciguleva/status/1935386452197785892 various], [https://x.com/juliewdesign_/status/1935395999175876696 various], [https://x.com/emollick/status/1935504703023899096 Ethan Mollick], [https://x.com/PJaccetturo/status/1935383312392151528 highly rated], [https://x.com/maxescu/status/1935674561821126847 complex environments], [https://x.com/CoffeeVectors/status/1935863623076675875 manga])
* June 2025: Examples:
** [https://x.com/StevieMac03/status/1935768436556378170 The Battle of Glenvael - Orcs vs Humans] (Hailuo)
** [https://x.com/HashemGhaili/status/1935036744568824208 The Sentence (9m, Veo3)]
** [https://x.com/elder_plinius/status/1936145834585862225 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025])
** [https://x.com/venturetwins/status/1937232461576175809 Gymnastics] (Hailuo 02)
** [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI] (Veo3)
** [https://x.com/thedorbrothers/status/1937926400507580726 Vorex (2m trailer)]
** [https://x.com/OnerBiberkoku/status/1938972810321281394 Doğrucu (3m music video, Veo3)]
* June 2025: [https://higgsfield.ai/soul Higgsfield Soul] Video Effects ([https://x.com/higgsfield_ai/status/1937931727084917097 examples], [https://x.com/HashemGhaili/status/1938278903765995611 realism])
* June 2025: Alibaba [https://omni-avatar.github.io/ OmniAvatar] ([https://arxiv.org/abs/2506.18866 paper], [https://github.com/Omni-Avatar/OmniAvatar code], [https://huggingface.co/OmniAvatar/OmniAvatar-14B model], [https://x.com/AngryTomtweets/status/1939850674776547359 examples])

====July 2025====
* July 2025: Examples:
** [https://x.com/Kavanthekid/status/1940452444850589999 Untold - The Immortal Blades Saga] (2m trailer)
** [https://x.com/minchoi/status/1941234456461029584 Unofficial commercial for Liquid Death (1m)]
** [https://x.com/brain_racked/status/1942594951310893425 A parade of the chosen theocracy on Callisto]
** [https://x.com/Popeyes/status/1943316484404433182 Popeyes commercial - diss track (1m)]
*** [https://x.com/gabemichael_ai/status/1944070622155616668 (Unofficial) Wendy's response - diss track (2m)]
*** [https://x.com/ai_massive/status/1947689537641357618 (Unofficial) In-N-Out rap battle (3m)]
** [https://x.com/Kalshi/status/1943339616716599548 Kalshi commercial]
** Jonah (25m TV show, [https://x.com/PJaccetturo/status/1946101701548880029 making of], [https://kingstonestudios.uscreen.io/programs/jonah purchase here])
** [https://x.com/Totemko/status/1946243585021452335 Unofficial commercial for Mercedes (17s)]
** [https://x.com/CoffeeVectors/status/1946016960916889632 Skateboarding music video (1m)]
* July 2025: Runway ML [https://help.runwayml.com/hc/en-us/articles/42311337895827-Creating-with-Act-Two Act-Two] (video-to-video performance transfer)
* July 2025: Examples:
** Neural Viz [https://www.youtube.com/watch?v=juDDHvHroQ8 The Cop Files: Part VI (8m)]
** [https://x.com/Kavanthekid/status/1947696716981145971 Perfect Dark - Concept Trailer (1.5m)]
** [https://x.com/StelfieTT/status/1948753090858885131 Exodus (2m trailer)]
** [https://x.com/Jett_Collective/status/1949140450553540841 A Walk Together - Life and love in motion (1m, Midjourney Video)]
* July 2025: Netflix sci-fi show [https://en.wikipedia.org/wiki/The_Eternaut_(TV_series) The Eternaut] [https://x.com/omooretweets/status/1946290797399400662 used genAI] for a particular scene (building collapse)
* July 2025: Google Veo [https://x.com/GoogleLabs/status/1948477692715700718 emergent annotation direction] ([https://x.com/venturetwins/status/1948771505783144641 example], [https://x.com/bilawalsidhu/status/1948844167603310660 example], [https://x.com/jboogx_creative/status/1949230927504371765 example], [https://x.com/Ror_Fly/status/1949606017739747625 example])
* July 2025: Runway [https://runwayml.com/research/introducing-runway-aleph Aleph] contextual editing
* July 2025: Wan 2.2 (open source, [https://x.com/Alibaba_Wan/status/1949804551655276989 examples])
====August 2025====
* August 2025: Pika [https://x.com/pika_labs/status/1954935844936024476 audio-driven performance] ([https://x.com/minchoi/status/1954989794129514937 examples], [https://x.com/pika_labs/status/1955007656302924192 examples])
* August 2025: Examples:
** [https://www.youtube.com/watch?v=gePD1Hf1qPc Eve and Adam] (8m, [https://x.com/MetaPuppet/status/1954254544935719259 multiple tools])
** [https://x.com/runwayml/status/1955615613583519917 Redesign a space] (Runway Aleph)
** [https://x.com/theGioM/status/1955656398248763428 Detroit Pretend Work Park (1m)]
** [https://x.com/pzf_ai/status/1940816374211006600 The Weight of Light] (3m music video, Midjourney & Suno)
** [https://x.com/EHuanglu/status/1956788759778967710 Commercial for Pepsi]
** [https://x.com/StelfieTT/status/1956633450326200426 Emotion]
** [https://x.com/dustinhollywood/status/1957940749862875383 TZIGANE]
** [https://x.com/0xFramer/status/1960720090921623636 Anime chase sequence] (Nano Banana and Seedance 1.0)
* August 2025: ByteDance [http://www.waver.video/ Waver 1.0]
* August 2025: [https://huggingface.co/Wan-AI/Wan2.2-S2V-14B Wan2.2-S2V 14B]

====September 2025====
* September 2025: [https://www.wsj.com/tech/ai/openai-backs-ai-made-animated-feature-film-389f70b0 OpenAI Backs AI-Made Animated Feature Film: Film, called ‘Critterz,’ aims to debut at Cannes Film Festival and will leverage startup’s AI tools and resources.]
* September 2025: Examples:
** [https://x.com/kentskooking/status/1964606423037542459 A loop to wake up to (30s)]
** [https://x.com/venturetwins/status/1966570512991350907 time lapse]
** [https://x.com/NeuralViz/status/1967391198487994652 The Adventures of Reemo Green] (11m, Neural Viz)
** [https://x.com/kellyeld/status/1967620786166079545 Surreal DJs music video (2m)]
** [https://x.com/dustinhollywood/status/1968724784440558044 Glass City] (Hailuo)
** [https://x.com/TheoMediaAI/status/1968646951227777529 Alarm] (1m, multiple tools including world synthesis for consistent environments)
* September 2025: [https://lumalabs.ai/ray Luma] [https://x.com/LumaLabsAI/status/1968684330034606372 Ray3] ([https://x.com/cfryant/status/1968692370725077251 example])
* September 2025: Examples:
** [https://x.com/mrjonfinger/status/1968687352382910469 Stop motion interpolation] (Luma Ray3)
** [https://x.com/heydin_ai/status/1969514789169959128 Skyland] (1.5m, various tools)
** [https://x.com/iamluokai/status/1970185972076925427 Dancing] (Wan 2.2)
** [https://x.com/c_valenzuelab/status/1970497214108815584 Under Armor commercial] (Runway Aleph)
** [https://x.com/FilmsBySav/status/1971247214795358706 OG PRIME] (10m, Kling)
** [https://www.youtube.com/watch?v=JGLoTjxd-Ss PLANET] (37m)
* September 2025: [https://x.com/Kling_ai/status/1970439808901362155 Kling AI 2.5 Turbo] (examples: [https://x.com/OrctonAI/status/1970472214794220008 cyberpunk], [https://x.com/ImagineArt_X/status/1970586138655236565 human motion], [https://x.com/fAIkout/status/1970505756853334324 motion and emotion], [https://x.com/fAIkout/status/1970495039248965636 painting], [https://x.com/venturetwins/status/1970563820478439546 gymnastics], [https://x.com/Art_For_Joy/status/1970249516033970434 breakdancing], [https://x.com/HaydenLeeWrites/status/1970523610734567819 combat], [https://x.com/umesh_ai/status/1970497680536150454 cinematic], [https://x.com/LillyLiCT/status/1970580585073819752 horror camerawork], [https://x.com/StevieMac03/status/1970559778804908331 extended sequence])
* September 2025: OpenAI [https://openai.com/index/sora-2/ Sora 2] ([https://x.com/minchoi/status/1973949620318580970 examples])

====October 2025====
* October 2025: Examples:
** [https://x.com/minchoi/status/1976042197154963702 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025])
** [https://www.youtube.com/watch?v=JhH3uxcdM1M Frostbite] (3m, Sora 2)
** [https://x.com/Jukanlosreve/status/1977764418709758106 (Fake) "Behind the scenes" for a Chainsaw Man live action] ([https://x.com/PJaccetturo/status/1972705821072261402 others])
* October 2025: Google [https://blog.google/technology/ai/veo-updates-flow/ Veo 3.1]
* October 2025: Examples:
** [https://x.com/aisearchio/status/1978465562821898461 Will Smith Eating Spaghetti], Veo 3.1 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025])
** [https://x.com/Diesol/status/1978755688261128227 War footage] (Veo 3.1)
** [https://www.meta.ai/@dustin_hollywood/post/bG3BHB21W0l/yukon/ Yukon] (music video, [https://x.com/dustinhollywood/status/1982260655957700746 Dustin Hollywood])
** [https://x.com/Diesol/status/1980922041131028515 Bloom] (2m, Veo 3.1)
** [https://x.com/xmuse_/status/1982026008803905639 Auction] (1m)
** [https://x.com/kellyeld/status/1982425147496882287 Dancing] (music video; Midjourney, Suno, Veo3)
** [https://x.com/JesusPlazaX/status/1982393609069412433 Anime example] (Midjourney, Grok Imagine)
** [https://x.com/EccentrismArt/status/1982830100266783039 King Arthur] (1m)
** [https://x.com/venturetwins/status/1983024227352789162 Transitions] (1m music video)
** [https://x.com/eastflatsfilm/status/1984116704704971076 Unofficial commercial for Nike] (2m, Midjourney, Hailuo)
** [https://x.com/PJaccetturo/status/1984639281848336592 Loneliness/Halloween] ([https://www.linkedin.com/posts/simon-meyer-976339160_this-could-be-the-scariest-halloween-film-activity-7389892778144735232-6CYY?utm_source=share&utm_medium=member_desktop&rcm=ACoAAADeoqYBzX8N9-j_hRQvl1e7OUlOgFptNF0 1.5m])
** [https://www.youtube.com/watch?v=43h61QAXjpY Wave] (2m music video, [https://x.com/MIZNOM Masaki Mizuno])
* October 2025: [https://x.com/Hailuo_AI/status/1983016390878708131 Hailuo 2.3]

====November 2025====
* November 2025: Examples:
** [https://x.com/subverum/status/1985069550250107033 Valley of Shadow] (6m)
** [https://x.com/DiscussingFilm/status/1985470088074375344 Coca-cola ad] (c.f. [https://x.com/techhalla/status/1857462526859935813 2024 ad])
** [https://x.com/venturetwins/status/1985755546222542903 France 2026 Olympics ad] (blend of genAI and traditional methods, [https://x.com/venturetwins/status/1985753512362590439 behind the scenes])
** [https://x.com/NeuralViz/status/1986611025366687754 Minnesota Nice] (3m, [https://x.com/NeuralViz Neural Viz])
** [https://x.com/machina9000/status/1986563727873740934 Brutalis] (7m)
** [https://x.com/tastypxls/status/1987312755485876502?s=20 Living The Dream - Rynn] (music video, 1m)
** [https://x.com/MrDavids1/status/1988366387111170339?s=20 Environment as Character]
** [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight]
** [https://x.com/LumaLabsAI/status/1989013731267998172?s=20 Overclock] (30s, Luma)
** [https://x.com/venturetwins/status/1980685301577326994?s=20 Music video] (30s, Wan Animate)
** [https://x.com/venturetwins/status/1990227418553209259?s=20 Promotional material for Pudong Art Museum - Louvre exhibition in Shanghai] (1m)
** [https://x.com/Kyrannio/status/1990324648488186358?s=20 Loop 87 A Temporal Heist] (12m, claim that video was generated fully autonomously using AI agent NoSpoon)
** [https://x.com/AzeAlter/status/1906974768705990794?s=20 Age of Beyond] (3m)
** [https://x.com/c_valenzuelab/status/1991245088446386495?s=20 Ausencia] (5m)
** [https://x.com/AngryTomtweets/status/1993047608617517246?s=20 live paintings] ([https://www.youtube.com/channel/UCw8kc0wDm5Bh6g9iZzEWfOg bandyquantguy] on YouTube)
** [https://x.com/BrianRoemmele/status/1994625579073900804?s=20 Michelle, on a server in Iowa] (1m)
* November 2025: [https://odyssey.ml/ Odyssey] - [https://x.com/odysseyml/status/1994873514579697830?s=20 Odyssey-2]

====December 2025====
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://app.klingai.com/global/all-tools Kling] [https://app.klingai.com/global/omni/new O1] ([https://x.com/minchoi/status/1995523379957559609?s=20 examples], [https://x.com/TheoMediaAI/status/1995517613414518987?s=20 other examples]) and Kling 2.6.
* December 2025: [https://app.pixverse.ai/onboard PixVerse v5.5]
* December 2025: Examples:
** [https://x.com/EHuanglu/status/1996649596119068687?s=20 Will Smith Eating Spaghetti], Kling 2.6 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025])
** [https://x.com/venturetwins/status/1997898095670296615?s=20 Dreamlike POV]
** [https://x.com/chatgpt21/status/1998253809307455555?s=20 McDonalds commercial]
** [https://x.com/EHuanglu/status/1998039554402750545?s=20 Skittles commercial] (Higgsfield)
** [https://x.com/Diesol/status/1997147919603077335?s=20 The Tenant] (2m, Kling 2.6)
** [https://x.com/PsyopAnime/status/1999242965659906526?s=20 Maximum Carnage] (3m)
** [https://x.com/JeffSynthesized/status/1998786836924395875?s=20 Blurred Horizon: Episode 1] (24m)
** [https://x.com/Artedeingenio/status/2001667487784460301?s=20 Anime Action] (2m)
** [https://x.com/bearlyai/status/2005055231617605748?s=20 Dollar Shave Club commercial] (1m)
** [https://x.com/EHuanglu/status/2004020543084024295?s=20 Xmas Cameos] (1.5m)
** [https://x.com/DiDi_OKK/status/1955653520407019976?s=20 Green Screen] (2m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/DiDi_OKK/status/1998227601341702639?s=20 Arrow] (7m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/bluehorizon_ai/status/2004045348579561503?s=20 Live Action One Punch Man | Saitama vs Genos] (2m, [https://x.com/bluehorizon_ai Blue Horizon])
** [https://x.com/keshiAIart/status/2005254907780358201?s=20 Anime Train] (6s)
** [https://x.com/venturetwins/status/2006051632837189683?s=20 Michael Catson] (13s)
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://arxiv.org/abs/2512.13507 Seedance 1.5]

===2026===
====January 2026====
* January 2026: Examples:
** [https://x.com/Itspedrito/status/2007636967048228968?s=20 Somebody That I Used to Know] (1m)
** [https://x.com/hujimari/status/2008054519704461407?s=20 Cat being disruptive at night], [https://x.com/klara_sjo/status/2007864014521720963?s=20 another], [https://x.com/alphafox/status/2009732284375830687?s=20 another] (c.f. [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight])
** [https://x.com/Uncanny_Harry/status/2008881579095961934?s=20 Character test] (30s, Kling 2.6 Motion Control, [https://x.com/Uncanny_Harry Uncanny Harry AI])
** [https://www.youtube.com/watch?v=SGJC4Hnz3m0&t=2s STAR WARS: Beggar’s Canyon | A Luke Skywalker Fan Film (Between ESB & ROTJ)] (7m)
** [https://x.com/dustinhollywood/status/2009732705299104118?s=20 TZIGANE] (9m)
** [https://x.com/Framer_X/status/2011075884246061454?s=20 The Subway Spark] (Anime, 45s)
** [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025])
** [https://x.com/PJaccetturo/status/2013675665539596651?s=20 The AI Artist] (1.5m)
** [https://x.com/Artedeingenio/status/2013624842021417030?s=20 Sci-fi action anime] (2m)
** [https://x.com/verbalriotshow/status/2014752509240475872?s=20 Stone Hand] (fake trailer, 1m)
* January 2026: [https://x.com/nvidia/status/2008346949301235933?s=20 Runway Gen-4.5 on] [https://www.nvidia.com/en-us/data-center/technologies/rubin/?linkId=100000401190502 Nvidia Rubin] ([https://x.com/runwayml/status/2014406560445771804?s=20 examples])
* January 2026: [https://ltx.io/model/ltx-2 LTX-2] open source video model (20s, 4k, w/ audio; [https://x.com/venturetwins/status/2010878914273697956?s=20 examples])
* January 2026: Luma [https://lumalabs.ai/blog/news/ray3_14 Ray3.14] ([https://x.com/LumaLabsAI/status/2015822842575888844?s=20 examples])
* January 2026: Examples:
** [https://x.com/pressmanc/status/2015099516500758647?s=20 Runway Gen-4.5 tests] (3.5m)
** [https://x.com/EHuanglu/status/2015573517618528538?s=20 Longchamp / Horses in the city] (1m)
** [https://x.com/dustinhollywood/status/2008154825385521418?s=20 The Last Artist] (trailer, 2m)
** [https://x.com/taziku_co/status/2015739943101047111?s=20 Monet temporal structure] (3m)
** [https://x.com/runwayml/status/2016155967285543364?s=20 Grizzlies] (1.5m, Runway Gen-4.5)
** [https://www.youtube.com/@TIME/videos On This Day... 1776] ([https://www.youtube.com/watch?v=E4cLKIxt8W8 trailer])
*** [https://www.youtube.com/watch?v=sV52AUVGc6I January 1: The Flag] (3.5m)
*** [https://www.youtube.com/watch?v=3ZDnL_a0YfQ January 10: Common Sense] (4.5m)
*** [https://www.youtube.com/watch?v=J5b1TiyKTus January 26: The Guns of Ticonderoga] (4m)

====February 2026====
* February 2026: [https://app.klingai.com/global/quickstart/klingai-video-3-omni-model-user-guide Kling 3.0]
* February 2026: [https://seedance2.ai/ Seedance 2.0] ([https://x.com/EHuanglu/status/2020131622675202512?s=20 example 1], [https://x.com/EHuanglu/status/2020492770872566053?s=20 2], [https://x.com/dynamicwangs/status/2020054894741451123?s=20 3], [https://x.com/patrickassale/status/2020180495900848470?s=20 4], [https://x.com/janekm/status/2020888750285332526?s=20 5], [https://x.com/Dork_sense/status/2020179955511116082?s=20 6], [https://x.com/EHuanglu/status/2020388244802740728?s=20 7], [https://x.com/zhao_dashuai/status/2020528048341217592?s=20 8], [https://x.com/AngryTomtweets/status/2020784886932738470?s=20 9], [https://x.com/javilopen/status/2020558352590287298?s=20 10], [https://x.com/linxiaobei888/status/2021399630672691710?s=20 11])
* February 2026: Examples:
** [https://x.com/PJaccetturo/status/2019072637192843463?s=20 Unofficial opening sequence for The Way of Kings by Brandon Sanderson] (1.5m, Kling 3)
** [https://x.com/dailycatsclips/status/2020117502915989680?s=20 Cat Dreams] (1.5m)
** [https://x.com/DotCSV/status/2021269435567218725?s=20 Will Smith Eating Spaghetti] (Seedance 2.0) (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025], [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ January 2026], [https://x.com/SpecialSitsNews/status/2020583709741883666?s=20 progression to 2026])
** [https://x.com/thedorbrothers/status/2023460644905742577?s=20 To Be Continued] (3m, [https://x.com/thedorbrothers The Dor Brothers])
** [https://x.com/ivanka_humeniuk/status/2023711181978919034?s=20 Crow - Game of Thrones] (1m)
** [https://x.com/billyrestey/status/2024193251763507528?s=20 Reboot] (2m)
** [https://x.com/kenw_2/status/2024625510534283508?s=20 Late for work] (1.5m, MJ NBP Seedance 2.0)
** [https://x.com/heydin_ai/status/2024616890338079181?s=20 AI Man] (4.5m, MJ NBP Seedance 2.0)
** [https://x.com/maxescu/status/2024882372836250033?s=20 But AI Will Never Be Able To Do This] (3m, Seedance 2.0)
** [https://x.com/DiDi_OKK/status/2018784243753599093?s=20 Sign] (8m)
** [https://x.com/LTXStudio/status/2025994426309640291?s=20 Commercial for Nexus] (1m)
** [https://x.com/maxescu/status/2026007558159278477?s=20 Showcase] (9m, [https://x.com/maxescu Alex Patrascu])
** [https://x.com/EHuanglu/status/2025410944512192536?s=20 Painterly] (30s, [https://x.com/EHuanglu el.cine])
** [https://x.com/kellyeld/status/2025975677657440267?s=20 Imposter Syndrone] (2m, music video)
** [https://www.youtube.com/watch?v=nKnE2Wn1VNQ All Is Conscious] (3.5m)
** [https://x.com/CuriousRefuge/status/2026086576191934769?s=20 Emotional argument] (3m, Seedance 2.0)
** [https://x.com/jdkanani/status/2023781028368884031?s=20 Moonlight Veil] (10m)

====March 2026====
* March 2026: Examples:
** [https://x.com/jacopo_reale/status/2029909372764041559 Looking for Bianca] (6m, Kling 3.0)
** [https://x.com/sumiturkude007/status/2030933543443193908?s=20 Gardener] (3m, Seedance 2.0)
** Micro-movie (Chinese): [https://x.com/yyyole/status/2029225419669684418?s=20 episode 1], [https://x.com/yyyole/status/2030850450464112675?s=20 episode 2]
** Live-action Evangelion: [https://x.com/NACHOS2D_/status/2032401289653461052?s=20 part 1] (4.5m), [https://x.com/NACHOS2D_/status/2032778868361203770?s=20 part 2] (3.5m), [https://x.com/NACHOS2D_/status/2033126071151837491?s=20 part 3] (2.5m)
** [https://x.com/lexx_aura/status/2033589846216741293?s=20 to love Wu Yong] (5m)
** [https://x.com/Alterverse_AI/status/2036434608137343111?s=20 Monkey's Paw] (5m)
* March 2026: [https://higgsfield.ai/original-series Higgsfield Original Series]

AI predictions

2026-03-24T18:59:56Z

KevinYager: /* Economic and Political */

=Capability Scaling=
* 2019-03: Rich Sutton: [https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf The Bitter Lesson]
* 2020-09: Ajeya Cotra: [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines Draft report on AI timelines]
* 2022-01: gwern: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis]
* 2023-05: Richard Ngo: [https://www.lesswrong.com/posts/BoA3agdkAzL6HQtQP/clarifying-and-predicting-agi Clarifying and predicting AGI]
* 2024-06: Aidan McLaughlin: [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
* 2025-03: [https://arxiv.org/abs/2503.14499 Measuring AI Ability to Complete Long Tasks Measuring AI Ability to Complete Long Tasks]
** 2025-04: [https://peterwildeford.substack.com/p/forecaster-reacts-metrs-bombshell Forecaster reacts: METR's bombshell paper about AI acceleration] New data supports an exponential AI curve, but lots of uncertainty remains
** 2025-04: AI Digest: [https://theaidigest.org/time-horizons A new Moore's Law for AI agents]
[[Image:GmZHL8xWQAAtFlF.jpeg|450px]]
* 2025-04: [https://epoch.ai/blog/trends-in-ai-supercomputers Trends in AI Supercomputers] ([https://arxiv.org/abs/2504.16026 preprint])
* [https://ai-timeline.org/ The Road to AGI] (timeline visualization)
* 2025-09: [https://arxiv.org/abs/2509.09677 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs]
* 2025-09: [https://www.julian.ac/blog/2025/09/27/failing-to-understand-the-exponential-again/ Failing to Understand the Exponential, Again]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/rRbDNQLfihiHbXytf/distinguish-between-inference-scaling-and-larger-tasks-use Distinguish between inference scaling and "larger tasks use more compute"]
* 2026-03: [https://arxiv.org/abs/2603.03992 Measuring AI R&D Automation] ([https://astrangeattractor.substack.com/p/measuring-ai-r-and-d-automation?triedRedirect=true blog])

==Scaling Laws==
See: [[AI_understanding#Scaling_Laws|Scaling Laws]]

==AGI Achievable==
* Yoshua Bengio: [https://arxiv.org/abs/2310.17688 Managing extreme AI risks amid rapid progress]
* Leopold Aschenbrenner: [https://situational-awareness.ai/from-gpt-4-to-agi/#Counting_the_OOMs Situational Awareness: Counting the OOMs]
* Richard Ngo: [https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5 Visualizing the deep learning revolution]
* Katja Grace: [https://blog.aiimpacts.org/p/2023-ai-survey-of-2778-six-things Survey of 2,778 AI authors: six parts in pictures]
* Epoch AI: [https://epoch.ai/trends Machine Learning Trends]
* AI Digest: [https://theaidigest.org/progress-and-dangers How fast is AI improving?]
* 2025-06: [https://80000hours.org/agi/guide/when-will-agi-arrive/ The case for AGI by 2030]

==AGI Definition==
* 2023-11: Allan Dafoe, Shane Legg, et al.: [https://arxiv.org/abs/2311.02462 Levels of AGI for Operationalizing Progress on the Path to AGI]
* 2024-04: Bowen Xu: [https://arxiv.org/abs/2404.10731 What is Meant by AGI? On the Definition of Artificial General Intelligence]
* 2025-10: Dan Hendrycks et al.: [https://www.agidefinition.ai/paper.pdf A Definition of AGI]
* 2026-01: [https://arxiv.org/abs/2601.07364 On the universal definition of intelligence]

==Recursive Self Improvement (RSI)==
* 2026-02: [https://80000hours.org/articles/how-ai-driven-feedback-loops-could-make-things-very-crazy-very-fast/ How AI-driven feedback loops could make things very crazy, very fast]

==Progress Models==
From [http://yager-research.ca/2025/04/ai-impact-predictions/ AI Impact Predictions]:

[[Image:AI impact models-2025 11 24.png|450px]]

=Economic and Political=
* 2019-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3482150 The Impact of Artificial Intelligence on the Labor Market]
* 2020-06: [https://www.openphilanthropy.org/research/modeling-the-human-trajectory/ Modeling the Human Trajectory] (GDP)
* 2021-06: [https://www.openphilanthropy.org/research/report-on-whether-ai-could-drive-explosive-economic-growth/ Report on Whether AI Could Drive Explosive Economic Growth]
* 2023-10: Marc Andreessen: [https://a16z.com/the-techno-optimist-manifesto/ The Techno-Optimist Manifesto]
* 2023-12: [https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html My techno-optimism]: "defensive acceleration" ([https://vitalik.eth.limo/index.html Vitalik Buterin])
* 2024-03: Noah Smith: [https://www.noahpinion.blog/p/plentiful-high-paying-jobs-in-the Plentiful, high-paying jobs in the age of AI: Comparative advantage is very subtle, but incredibly powerful.] ([https://x.com/liron/status/1768013030741475485 video])
* 2024-03: [https://doi.org/10.3386/w32255 Scenarios for the Transition to AGI] (AGI leads to wage collapse)
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-06: [https://www.frbsf.org/wp-content/uploads/AI-and-Growth-Aghion-Bunel.pdf AI and Growth: Where Do We Stand?]
* 2024-09: OpenAI [https://cdn.openai.com/global-affairs/openai-infra-economics-10.09.24.pdf Infrastructure is Destiny: Economic Returns on US Investment in Democratic AI]
* 2024-12: [https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi By default, capital will matter more than ever after AGI] (L Rudolf L)
* 2025-01: [https://lukedrago.substack.com/p/the-intelligence-curse The Intelligence Curse]: With AGI, powerful actors will lose their incentives to invest in people
** Updated 2025-04: [https://intelligence-curse.ai/ The Intelligence Curse] (Luke Drago and Rudolf Laine)
*** [https://intelligence-curse.ai/pyramid/ Pyramid Replacement]
*** [https://intelligence-curse.ai/capital/ Capital, AGI, and Human Ambition]
*** [https://intelligence-curse.ai/defining/ Defining the Intelligence Curse]
*** [https://intelligence-curse.ai/shaping/ Shaping the Social Contract]
*** [https://intelligence-curse.ai/breaking/ Breaking the Intelligence Curse]
*** [https://intelligence-curse.ai/history/ History is Yours to Write]
* 2025-01: Microsoft: [https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/ The Golden Opportunity for American AI]
* 2025-01: [https://www.maximum-progress.com/p/agi-will-not-make-labor-worthless AGI Will Not Make Labor Worthless]
* 2025-01: [https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf AI in America: OpenAI's Economic Blueprint] ([https://openai.com/global-affairs/openais-economic-blueprint/ blog])
* 2025-01: [https://inferencemagazine.substack.com/p/how-much-economic-growth-from-ai How much economic growth from AI should we expect, how soon?]
* 2025-02: Morgan Stanley: [https://advisor.morganstanley.com/john.howard/documents/field/j/jo/john-howard/The_Humanoid_100_-_Mapping_the_Humanoid_Robot_Value_Chain.pdf The Humanoid 100: Mapping the Humanoid Robot Value Chain]
* 2025-02: [https://www.anthropic.com/news/the-anthropic-economic-index The Anthropic Economic Index]: [https://assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations]
* 2025-02: [https://arxiv.org/abs/2502.11264 Strategic Wealth Accumulation Under Transformative AI Expectations]
* 2025-02: Tyler Cowen: [https://marginalrevolution.com/marginalrevolution/2025/02/why-i-think-ai-take-off-is-relatively-slow.html Why I think AI take-off is relatively slow]
* 2025-03: Epoch AI: [https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d Most AI value will come from broad automation, not from R&D]
** The primary economic impact of AI will be its ability to broadly automate labor
** Automating AI R&D alone likely won’t dramatically accelerate AI progress
** Fully automating R&D requires a very broad set of abilities
** AI takeoff will likely be diffuse and salient
* 2025-03: [https://www.anthropic.com/news/anthropic-economic-index-insights-from-claude-sonnet-3-7 Anthropic Economic Index: Insights from Claude 3.7 Sonnet]
* 2025-04: [https://inferencemagazine.substack.com/p/will-there-be-extreme-inequality Will there be extreme inequality from AI?]
* 2025-04: [https://www.anthropic.com/research/impact-software-development Anthropic Economic Index: AI’s Impact on Software Development]
* 2025-05: [https://www.theguardian.com/books/2025/may/04/the-big-idea-can-we-stop-ai-making-humans-obsolete Better at everything: how AI could make human beings irrelevant]
* 2025-05: Forethought: [https://www.forethought.org/research/the-industrial-explosion The Industrial Explosion]
* 2025-05: [https://arxiv.org/abs/2505.20273 Ten Principles of AI Agent Economics]
* 2025-07: [https://substack.com/home/post/p-167879696 What Economists Get Wrong about AI] They ignore innovation effects, use outdated capability assumptions, and miss the robotics revolution
* 2025-07: [https://www.nber.org/books-and-chapters/economics-transformative-ai/we-wont-be-missed-work-and-growth-era-agi We Won't Be Missed: Work and Growth in the Era of AGI]
* 2025-07: [https://www.nber.org/papers/w34034 The Economics of Bicycles for the Mind]
* 2025-09: [https://conference.nber.org/conf_papers/f227491.pdf Genius on Demand: The Value of Transformative Artificial Intelligence]
* 2025-10: [https://peterwildeford.substack.com/p/ai-is-probably-not-a-bubble AI is probably not a bubble: AI companies have revenue, demand, and paths to immense value]
* 2025-11: [https://windowsontheory.org/2025/11/04/thoughts-by-a-non-economist-on-ai-and-economics/ Thoughts by a non-economist on AI and economics]
* 2025-11: [https://www.nber.org/papers/w34444 Artificial Intelligence, Competition, and Welfare]
* 2025-11: [https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations] (Anthropic)
* 2025-12: [https://benjamintodd.substack.com/p/how-ai-driven-feedback-loops-could How AI-driven feedback loops could make things very crazy, very fast]
* 2025-12: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth] (Philip Trammell and Leopold Aschenbrenner)
* 2026-01: [https://www.anthropic.com/research/anthropic-economic-index-january-2026-report Anthropic Economic Index: new building blocks for understanding AI use]
* 2026-01: [https://www.anthropic.com/research/economic-index-primitives Anthropic Economic Index report: economic primitives]
* 2026-02: Nate Silver: [https://www.natesilver.net/p/the-singularity-wont-be-gentle The singularity won't be gentle: If AI is even half as transformational as Silicon Valley assumes, politics will never be the same again]
* 2026-03: [https://www.anthropic.com/research/economic-index-march-2026-report Anthropic Economic Index report: Learning curves]

==Job Loss==
* 2023-03: [https://arxiv.org/pdf/2303.10130 GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models]
** 2023-03: [https://www.livemint.com/news/world/these-jobs-are-most-at-risk-due-to-chatgpt-as-per-openai-study-11679358453267.html These jobs are most at risk due to ChatGPT, as per OpenAI study]
* 2023-08: [https://dx.doi.org/10.2139/ssrn.4527336 The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market]
** [https://x.com/jburnmurdoch/status/1722938749519077688 Freelancer sector shrinking]
[[Image:F-kVQuvWkAAemkr.png|400px]]
* 2023-09: [https://global-uploads.webflow.com/64d5f73a7fc5e8a240310c4d/650a128a34386a1206b6506c_FINAL%20Briefing%20-%20Adoption%20of%20Automation%20and%20AI%20in%20the%20UK.pdf What drives UK firms to adopt AI and robotics, and what are the consequences for jobs?]
** [https://www.digitalinformationworld.com/2023/09/78-of-companies-say-ai-created-more-jobs.html 78% of Companies Say AI Created More Jobs]
* 2023-11: [https://theaipi.org/ai-interactive-map/ New Analysis Shows Over 20% of US Jobs Significantly Exposed to AI Automation In the Near Future]
* 2024-01: [https://techcrunch.com/2024/01/09/duolingo-cut-10-of-its-contractor-workforce-as-the-company-embraces-ai/ Duolingo cuts 10% of its contractor workforce as the company embraces AI]
* 2024-02: [https://www.pwc.com/gx/en/issues/c-suite-insights/the-leadership-agenda/gen-ai-is-a-tool-for-growth-not-just-efficiency.html#:~:text=One%20out%20of%20every%20four%20of%20the%204%2C702,to%20accomplish%20the%20same%20tasks%20with%20fewer%20workers Gen AI is a tool for growth, not just efficiency: Tech CEOs are investing to build their workforce and capitalise on new opportunities from generative AI. That’s a sharp contrast to how their peers view it.]
* 2024-04: [https://www.nytimes.com/2024/04/10/business/investment-banking-jobs-artificial-intelligence.html AI is Poised to Replace the Entry-Level Grunt Work of a Wall Street Career]
* 2024-07: [https://www.wired.com/story/ai-is-already-taking-jobs-in-the-video-game-industry/ AI Is Already Taking Jobs in the Video Game Industry]: A WIRED investigation finds that major players like Activision Blizzard, which recently laid off scores of workers, are using generative AI for game development
* 2024-08: [https://www.bbc.com/news/articles/c80e1gp9m9zo Klarna: AI lets us cut thousands of jobs - but pay more]
* 2025-01: [https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/4f39375d-59c2-4c4a-b394-f3eed7858c80/content AI and Freelancers: Has the Inflection Point Arrived?]
* 2025-01: [https://www.aporiamagazine.com/p/yes-youre-going-to-be-replaced Yes, you're going to be replaced: So much cope about AI]
* 2025-03: [https://commonplace.org/2025/03/20/will-ai-automate-away-your-job/ Will AI Automate Away Your Job? The time-horizon model explains the future of the technology]
* 2025-05: [https://www.forbes.com/sites/jackkelly/2025/05/04/its-time-to-get-concerned-klarna-ups-duolingo-cisco-and-many-other-companies-are-replacing-workers-with-ai/ It’s Time To Get Concerned, Klarna, UPS, Duolingo, Cisco, And Many Other Companies Are Replacing Workers With AI]
* 2025-05: [https://time.com/7289692/when-ai-replaces-workers/ What Happens When AI Replaces Workers?]
* 2025-05: [https://www.oxfordeconomics.com/resource/educated-but-unemployed-a-rising-reality-for-us-college-grads/ Educated but unemployed, a rising reality for US college grads] Structural shifts in tech hiring and the growing impact of AI are driving higher unemployment among recent college graduates
* 2025-05: NY Times: [https://www.nytimes.com/2025/05/30/technology/ai-jobs-college-graduates.html?unlocked_article_code=1.LE8.LlC6.eT5XcpA9hxC2&smid=url-share For Some Recent Graduates, the A.I. Job Apocalypse May Already Be Here] The unemployment rate for recent college graduates has jumped as companies try to replace entry-level workers with artificial intelligence
* 2025-06: [https://80000hours.org/agi/guide/skills-ai-makes-valuable/ How not to lose your job to AI] The skills AI will make more valuable (and how to learn them)
* 2025-06: [https://arxiv.org/abs/2506.06576 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce]
[[Image:0dab4c86-882d-4095-9d12-d19684ed5184 675x680.png|300px]]
* 2025-07: Harvard Business Review: [https://hbr.org/2025/06/what-gets-measured-ai-will-automate What Gets Measured, AI Will Automate]
* 2025-08: [https://digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/ Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence]
* 2025-10: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5560401 Performance or Principle: Resistance to Artificial Intelligence in the U.S. Labor Market]
* 2025-10: [https://www.siliconcontinent.com/p/the-ai-becker-problem The AI Becker problem: Who will train the next generation?]
* 2026-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6134506 AI, Automation, and Expertise]
* 2026-02: [https://arachnemag.substack.com/p/the-jevons-paradox-for-intelligence The Jevons Paradox for Intelligence: Fears of AI-induced job loss could not be more wrong]
* 2026-03: [https://www.dropbox.com/scl/fo/689u1g785x8jp6c8v1s21/AKxZ_N15vUxMA3PBtpbr5nM?dl=0&e=1&preview=2026.03.24+Bundles.pdf&rlkey=ottgcu71u1t4mhn6tblvatu8w&st=dj6k0x2o Weak Bundle, Strong Bundle:How AI Redraws Job Boundaries]

==Productivity Impact==
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2026-02: [https://www.ft.com/content/4b51d0b4-bbfe-4f05-b50a-1d485d419dc5 The AI productivity take-off is finally visible] ([https://x.com/erikbryn/status/2023075588974735869?s=20 Erik Brynjolfsson])
** Businesses are finally beginning to reap some of AI's benefits.
* 2026-02: New York Times: [https://www.nytimes.com/2026/02/18/opinion/ai-software.html The A.I. Disruption We’ve Been Waiting for Has Arrived]

==National Security==
* 2025-04: Jeremie Harris and Edouard Harris: [https://superintelligence.gladstone.ai/ America’s Superintelligence Project]

==AI Manhattan Project==
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-10: [https://thezvi.substack.com/p/ai-88-thanks-for-the-memos?open=false#%C2%A7thanks-for-the-memos-introduction-and-competitiveness White House Memo calls for action on AI]
* 2024-11: [https://www.uscc.gov/annual-report/2024-annual-report-congress 2024 Annual Report to Congress]: [https://www.reuters.com/technology/artificial-intelligence/us-government-commission-pushes-manhattan-project-style-ai-initiative-2024-11-19/ calls] for "Manhattan Project-style" effort
* 2025-05-29: [https://x.com/ENERGY/status/1928085878561272223 DoE Tweet]: "AI is the next Manhattan Project, and THE UNITED STATES WILL WIN. 🇺🇸"
* 2025-07: [https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get How big could an “AI Manhattan Project” get?]

=Near-term=
* 2021-08: Daniel Kokotajlo: [https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like What 2026 looks like]
* 2025-02: Sam Altman: [https://blog.samaltman.com/three-observations Three Observations]
*# The intelligence of an AI model roughly equals the log of the resources used to train and run it.
*# The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use
*# The socioeconomic value of linearly increasing intelligence is super-exponential in nature
* 2025-03: [https://www.pathwaysai.org/p/glimpses-of-ai-progess Glimpses of AI Progress: Mental models for fast times]
* 2025-03: [https://www.nature.com/articles/s41598-025-92190-7 Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
** 2025-07: Video: [https://www.youtube.com/watch?v=5KVDDfAkRgc Are We 3 Years From AI Disaster? A Rigorous Forecast]
* 2025-04: Stanford HAI: [https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf Artificial Intelligence Index Report 2025]
* 2025-04: Arvind Narayananand Sayash Kapoor: [https://kfai-documents.s3.amazonaws.com/documents/c3cac5a2a7/AI-as-Normal-Technology---Narayanan---Kapoor.pdf AI as Normal Technology]
* 2025-04: Dwarkesh Patel: [https://www.dwarkesh.com/p/questions-about-ai Questions about the Future of AI]
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: IdeaFoundry: [https://ideafoundry.substack.com/p/evolution-vs-extinction-the-choice Evolution vs. Extinction: The Choice is Ours] The next 18 months will decide whether AI ends us or evolves us
* 2025-07: [https://cfg.eu/advanced-ai-possible-futures/ Advanced AI: Possible futures] Five scenarios for how the AI-transition could unfold
* 2025-11: [https://android-dreams.ai/ Android Dreams]
* 2026-02: [https://www.citriniresearch.com/ Citrini]: [https://www.citriniresearch.com/p/2028gic The 2028 Global Intelligence Crisis: A Thought Exercise in Financial History, from the Future]

==Insightful Analysis of Current State==
* 2025-11: Andy Masley: [https://andymasley.substack.com/p/the-lump-of-cognition-fallacy The lump of cognition fallacy: The extended mind as the advance of civilization]
* 2026-02: Eric Jang: [https://evjang.com/2026/02/04/rocks.html As Rocks May Think]
* 2026-02: Matt Shumer: [https://x.com/mattshumer_/status/2021256989876109403 Something Big Is Happening]
* 2026-02: Minh Pham: [https://x.com/buckeyevn/status/2014171253045960803?s=20 Why Most Agent Harnesses Are Not Bitter Lesson Pilled]

=Overall=
* 1993: [https://en.wikipedia.org/wiki/Vernor_Vinge Vernor Vinge]: [https://edoras.sdsu.edu/~vinge/misc/singularity.html The Coming Technological Singularity: How to Survive in the Post-Human Era]
* 2025-03: Kevin Roose (New York Times): [https://www.nytimes.com/2025/03/14/technology/why-im-feeling-the-agi.html?unlocked_article_code=1.304.TIEy.SmNhKYO4e9c7&smid=url-share Powerful A.I. Is Coming. We’re Not Ready.] Three arguments for taking progress toward artificial general intelligence, or A.G.I., more seriously — whether you’re an optimist or a pessimist.
* 2025-03: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/thoughts-on-future-ai.html My Thoughts on the Future of "AI"]: "I have very wide error bars on the potential future of large language models, and I think you should too."
* 2025-06: Sam Altman: [https://blog.samaltman.com/the-gentle-singularity The Gentle Singularity]

==Surveys of Opinions/Predictions==
* 2016-06: [https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/ 2016 Expert Survey on Progress in AI]
** 2023-03: [https://aiimpacts.org/scoring-forecasts-from-the-2016-expert-survey-on-progress-in-ai/ Scoring forecasts from the 2016 “Expert Survey on Progress in AI”]
* 2022-10: Forecasting Research Institute: [https://forecastingresearch.org/near-term-xpt-accuracy Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament]
** 2025-09: Ethan Mollick: [https://x.com/emollick/status/1962859757674344823 Progress is ahead of expectations]
* 2023-08: [https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2023_expert_survey_on_progress_in_ai 2023 Expert Survey on Progress in AI]
* 2024-01: [https://arxiv.org/abs/2401.02843 Thousands of AI Authors on the Future of AI]
* 2025-02: [https://arxiv.org/abs/2502.14870 Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts]
* 2025-02: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/forecasting-ai-2025-update.html AI forecasting retrospective: you're (probably) over-confident]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/long-timelines-to-advanced-ai-have "Long" timelines to advanced AI have gotten crazy short]
* 2025-05: [https://theaidigest.org/ai2025-analysis-may AI 2025 Forecasts - May Update]
* 2026-02: [https://www.nature.com/articles/s41598-026-39070-w Lay beliefs about the badness, likelihood, and importance of human extinction]

==Bad Outcomes==
* [https://pauseai.info/pdoom List of p(doom) values]
* 2019-03: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like What failure looks like]
* 2023-03: gwern: [https://gwern.net/fiction/clippy It Looks Like You’re Trying To Take Over The World]
* 2025-01: [https://arxiv.org/abs/2501.16946 Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development] ([https://gradual-disempowerment.ai/ web version])
** 2025-02: [https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment The Risk of Gradual Disempowerment from AI]
** 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-09: [https://doctrines.ai/ The three main doctrines on the future of AI]
** '''Dominance doctrine:''' First actor to create advanced AI will attain overwhelming strategic superiority
** '''Extinction doctrine:''' Humanity will lose control of ASI, leading to extinction or permanent disempowerment
** '''Replacement doctrine:''' AI will automate human tasks, but without fundamentally reshaping or ending civilization
* 2025-09: Sean ÓhÉigeartaigh: [https://www.cambridge.org/core/journals/cambridge-prisms-extinction/article/extinction-of-the-human-species-what-could-cause-it-and-how-likely-is-it-to-occur/D8816A79BEF5A4C30A3E44FD8D768622 Extinction of the human species: What could cause it and how likely is it to occur?]

==Intelligence Explosion==
* 2023-06: [https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/ What a Compute-Centric Framework Says About Takeoff Speeds]
** [https://takeoffspeeds.com/ takeoffspeeds.com simulator]
* 2025-02: [https://www.forethought.org/research/three-types-of-intelligence-explosion Three Types of Intelligence Explosion]
* 2025-03: Future of Life Institute: [https://futureoflife.org/ai/are-we-close-to-an-intelligence-explosion/ Are we close to an intelligence explosion?] AIs are inching ever-closer to a critical threshold. Beyond this threshold lie great risks—but crossing it is not inevitable.
* 2025-03: Forethought: [https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion Will AI R&D Automation Cause a Software Intelligence Explosion?]
[[Image:Gm-1jugbYAAtq Y.jpeg|450px]]
* 2025-05: [https://www.thelastinvention.ai/ The Last Invention] Why Humanity’s Final Creation Changes Everything
* 2025-08: [https://www.forethought.org/research/how-quick-and-big-would-a-software-intelligence-explosion-be How quick and big would a software intelligence explosion be?]

==Superintelligence==
* 2024-10: [http://yager-research.ca/2024/10/how-smart-will-asi-be/ How Smart will ASI be?]
* 2024-11: [http://yager-research.ca/2024/11/concise-argument-for-asi-risk/ Concise Argument for ASI Risk]
* 2025-03: [https://dynomight.net/smart/ Limits of smart]
* 2025-05: [https://timfduffy.substack.com/p/the-limits-of-superintelligence?manualredirect= The Limits of Superintelligence]

==Long-range/Philosophy==
* 2023-03: Dan Hendrycks: [https://arxiv.org/abs/2303.16200 Natural Selection Favors AIs over Humans]

=Psychology=
* 2025-01: [https://longerramblings.substack.com/p/a-defence-of-slowness-at-the-end A defence of slowness at the end of the world]

=Positives & Optimism=
==Science & Technology Improvements==
* 2023-05: [https://www.planned-obsolescence.org/author/kelsey/ Kelsey Piper]: [https://www.planned-obsolescence.org/the-costs-of-caution/ The costs of caution]
* 2024-09: Sam Altman: [https://ia.samaltman.com/ The Intelligence Age]
* 2024-10: Dario Amodei: [https://darioamodei.com/machines-of-loving-grace Machines of Loving Grace]
* 2024-11: Google DeepMind: [https://www.aipolicyperspectives.com/p/a-new-golden-age-of-discovery A new golden age of discovery]
* 2025-03: [https://finmoorhouse.com/ Fin Moorhouse], [https://www.williammacaskill.com/ Will MacAskill]: [https://www.forethought.org/research/preparing-for-the-intelligence-explosion Preparing for the Intelligence Explosion]

==Social==
* 2025-09: [https://blog.cosmos-institute.org/p/coasean-bargaining-at-scale Coasean Bargaining at Scale]: Decentralization, coordination, and co-existence with AGI
* 2025-10: [https://www.nber.org/system/files/chapters/c15309/c15309.pdf#page=15.23 The Coasean Singularity? Demand, Supply, and Market Design with AI Agents]

==Post-scarcity Society==
* 2004: Eliezer Yudkowsky (MIRI): [https://intelligence.org/files/CEV.pdf Coherent Extrapolated Volition] and [https://www.lesswrong.com/s/d3WgHDBAPYYScp5Em/p/K4aGvLnHvYgX9pZHS Fun Theory]
* 2019: John Danaher: [https://www.jstor.org/stable/j.ctvn5txpc Automation and Utopia: Human Flourishing in a World Without Work]

==The Grand Tradeoff==
* 2026-02: Nick Bostrom: [https://nickbostrom.com/optimal.pdf Optimal Timing for Superintelligence: Mundane Considerations for Existing People]

=Plans=
* [https://www.narrowpath.co/ A Narrow Path: How to Secure our Future]
* Marius Hobbhahn: [https://www.lesswrong.com/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan What’s the short timeline plan?]
* [https://cfg.eu/building-cern-for-ai/ Building CERN for AI: An institutional blueprint]
* [https://arxiv.org/abs/2503.05710 AGI, Governments, and Free Societies]
* [https://controlai.com/ Control AI]: [https://controlai.com/dip The Direct Institutional Plan]
* Luke Drago and L Rudolf L: [https://lukedrago.substack.com/p/the-use-of-knowledge-in-agi-society?triedRedirect=true The use of knowledge in (AGI) society]: How to build to break the [https://lukedrago.substack.com/p/the-intelligence-curse intelligence curse]
* [https://www.agisocialcontract.org/ AGI Social Contract]
** [https://www.agisocialcontract.org/forging-a-new-agi-social-contract Forging A New AGI Social Contract]
* Yoshua Bengio: [https://time.com/7283507/safer-ai-development/ A Potential Path to Safer AI Development]
** 2025-02: [https://arxiv.org/abs/2502.15657 Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?]
* 2026-01: Dario Amodei: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/vjAM7F8vMZS7oRrrh/how-do-we-more-safely-defer-to-ais How do we (more) safely defer to AIs?]

==Philosophy==
* [https://danfaggella.com/ Dan Faggella]:
** 2018-07: [https://danfaggella.com/moral-singularity/ Moral Singularity – Unpredictable Values Bodes Poorly for Humanity]
** 2025-02: [https://danfaggella.com/bend/ There is No Pause – We Must Bend the Posthuman Trajectory]
* Joe Carlsmith: 2024: [https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi Otherness and control in the age of AGI]
*# [https://joecarlsmith.com/2024/01/02/gentleness-and-the-artificial-other Gentleness and the artificial Other]
*# [https://joecarlsmith.com/2024/01/04/deep-atheism-and-ai-risk Deep atheism and AI risk]
*# [https://joecarlsmith.com/2024/01/08/when-yang-goes-wrong When “yang” goes wrong]
*# [https://joecarlsmith.com/2024/01/09/does-ai-risk-other-the-ais Does AI risk “other” the AIs?]
*# [https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism An even deeper atheism]
*# [https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy Being nicer than Clippy]
*# [https://joecarlsmith.com/2024/01/18/on-the-abolition-of-man On the abolition of man]
*# [https://joecarlsmith.com/2024/03/21/on-green On green]
*# [https://joecarlsmith.com/2024/03/25/on-attunement On attunement]
*# [https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust Loving a world you don’t trust]
* Anthony Aguirre:
** [https://x.com/AnthonyNAguirre/status/1898023049930457468 2025-03]: [https://keepthefuturehuman.ai/ Keep The Future Human]
[[Image:GlchEeObwAQ88NK.jpeg|300px]]
* 2025-04: Scott Alexander (Astral Codex Ten): [https://www.astralcodexten.com/p/the-colors-of-her-coat The Colors Of Her Coat] (response to [https://www.theintrinsicperspective.com/p/welcome-to-the-semantic-apocalypse semantic apocalypse] and semantic satiation)
* 2025-05: Helen Toner: [https://www.ai-frontiers.org/articles/were-arguing-about-ai-safety-wrong We’re Arguing About AI Safety Wrong]: Dynamism vs. stasis is a clearer lens for criticizing controversial AI safety prescriptions
* 2025-05: Joe Carlsmith: [https://joecarlsmith.substack.com/p/the-stakes-of-ai-moral-status The stakes of AI moral status]

==Research==
* 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]

==Alignment==
* 2023-03: Leopold Aschenbrenner: [https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/ Nobody’s on the ball on AGI alignment]
* 2024-03: [https://arxiv.org/abs/2404.10636 What are human values, and how do we align AI to them?] ([https://meaningalignment.substack.com/p/0480e023-98c0-4633-a604-990d3ac880ac blog])
* 2025: Joe Carlsmith: [https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem How do we solve the alignment problem?] Introduction to an essay series on paths to safe, useful superintelligence
*# [https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment What is it to solve the alignment problem?] Also: to avoid it? Handle it? Solve it forever? Solve it completely? ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16617671-what-is-it-to-solve-the-alignment-problem audio version])
*# [https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power When should we worry about AI power-seeking?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16651469-when-should-we-worry-about-ai-power-seeking audio version])
*# [https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety Paths and waystations in AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16768804-paths-and-waystations-in-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/ai-for-ai-safety AI for AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16790183-ai-for-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/can-we-safely-automate-alignment Can we safely automate alignment research?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17069901-can-we-safely-automate-alignment-research audio version], [https://joecarlsmith.substack.com/p/video-and-transcript-of-talk-on-automating?utm_source=post-email-title&publication_id=1022275&post_id=162375391&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email video version])
*# [https://joecarlsmith.substack.com/p/giving-ais-safe-motivations?utm_source=post-email-title&publication_id=1022275&post_id=171250683&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email Giving AIs safe motivations] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17686921-giving-ais-safe-motivations audio version])
*# [https://joecarlsmith.com/2025/09/29/controlling-the-options-ais-can-pursue Controlling the options AIs can pursue] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17909401-controlling-the-options-ais-can-pursue audio version])
*# [https://joecarlsmith.substack.com/p/how-human-like-do-safe-ai-motivations?utm_source=post-email-title&publication_id=1022275&post_id=178666988&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email How human-like do safe AI motivations need to be?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18175429-how-human-like-do-safe-ai-motivations-need-to-be audio version])
*# [https://joecarlsmith.substack.com/p/building-ais-that-do-human-like-philosophy Building AIs that do human-like philosophy: AIs will face philosophical questions humans can't answer for them] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18591342-building-ais-that-do-human-like-philosophy audio version])
*# [https://joecarlsmith.substack.com/p/on-restraining-ai-development-for?utm_source=post-email-title&publication_id=1022275&post_id=191385185&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email On restraining AI development for the sake of safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18869440-on-restraining-ai-development-for-the-sake-of-safety audio version])
* 2025-04: Dario Amodei: [https://www.darioamodei.com/post/the-urgency-of-interpretability The Urgency of Interpretability]

==Strategic/Technical==
* 2025-03: [https://resilience.baulab.info/docs/AI_Action_Plan_RFI.pdf AI Dominance Requires Interpretability and Standards for Transparency and Security]
* 2026-02: [https://www.gap-map.org/capabilities/?sort=bottlenecks Fundamental Development Gap Map v1.0]

==Strategic/Policy==
* 2015-03: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-2 Machine intelligence, part 2]
* 2019-07: Amanda Askell, Miles Brundage, Gillian Hadfield: [https://arxiv.org/abs/1907.04534 The Role of Cooperation in Responsible AI Development]
* 2025-03: Dan Hendrycks, Eric Schmidt, Alexandr Wang: [https://www.nationalsecurity.ai/ Superintelligence Strategy]
** [https://www.nationalsecurity.ai/chapter/executive-summary Executive Summary]
** [https://www.nationalsecurity.ai/chapter/introduction Introduction]
** [https://www.nationalsecurity.ai/chapter/ai-is-pivotal-for-national-security AI Is Pivotal for National Security]
** [https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim Deterrence with Mutual Assured AI Malfunction (MAIM)]
** [https://www.nationalsecurity.ai/chapter/nonproliferation Nonproliferation]
** [https://www.nationalsecurity.ai/chapter/competitiveness Competitiveness]
** [https://www.nationalsecurity.ai/chapter/conclusion Conclusion]
** [https://www.nationalsecurity.ai/chapter/appendix Appendix FAQs]
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human] ([https://keepthefuturehuman.ai/essay/ essay])
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late (video)] (2025)
**# Oversight: Registration required for training >1025 FLOP and inference >1019 FLOP/s (~1,000 B200 GPUs @ $25M). Build cryptographic licensing into hardware.
**# Computation Limits: Ban on training models >1027 FLOP or inference >1020 FLOP/s.
**# Strict Liability: Hold AI companies responsible for outcomes.
**# Tiered Regulation: Low regulation on tool-AI, strictest regulation on AGI (general, capable, autonomous systems).
* 2025-04: [https://x.com/deanwball Dean W. Ball]: [https://arxiv.org/abs/2504.11501 A Framework for the Private Governance of Frontier Artificial Intelligence]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/nonproliferation-is-the-wrong-approach?source=queue Nonproliferation is the wrong approach to AI misuse]
* 2025-04: MIRI: [https://techgov.intelligence.org/research/ai-governance-to-avoid-extinction AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions]
* 2025-05: [https://writing.antonleicht.me/p/the-new-ai-policy-frontier The New AI Policy Frontier]: Beyond the shortcomings of centralised control and alignment, a new school of thought on AI governance emerges. It still faces tricky politics.
* 2025-05: [https://uncpga.world/agi-uncpga-report/ AGI UNCPGA Report]: Governance of the Transition to Artificial General Intelligence (AGI) Urgent Considerations for the UN General Assembly: Report for the Council of Presidents of the United Nations General Assembly (UNCPGA)
* 2025-06: [https://writing.antonleicht.me/p/ai-and-jobs-politics-without-policy AI & Jobs: Politics without Policy] Political support mounts - for a policy platform that does not yet exist
* 2025-06: [https://x.com/littIeramblings Sarah Hastings-Woodhouse]: [https://drive.google.com/file/d/1mmdHBE6M2yiyL21-ctTuRLNH5xOFjqWm/view Safety Features for a Centralized AGI Project]
* 2025-07: [https://writing.antonleicht.me/p/a-moving-target A Moving Target] Why we might not be quite ready to comprehensively regulate AI, and why it matters
* 2025-07: [https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf Anthropic: Build AI in America] ([https://www.anthropic.com/news/build-ai-in-america blog])
* 2025-12: [https://asi-prevention.com/ How middle powers may prevent the development of artificial superintelligence]
* 2026-03: [https://humanstatement.org/ The Pro-Human AI Declaration]

==Restriction==
* 2024-05: OpenAI: [https://openai.com/index/reimagining-secure-infrastructure-for-advanced-ai/ Reimagining secure infrastructure for advanced AI] OpenAI calls for an evolution in infrastructure security to protect advanced AI
* 2025-07: MIRI: [https://arxiv.org/abs/2507.09801 Technical Requirements for Halting Dangerous AI Activities]

=See Also=
* [[AI safety]]

AI predictions

2026-03-24T15:52:34Z

KevinYager: /* Job Loss */

=Capability Scaling=
* 2019-03: Rich Sutton: [https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf The Bitter Lesson]
* 2020-09: Ajeya Cotra: [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines Draft report on AI timelines]
* 2022-01: gwern: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis]
* 2023-05: Richard Ngo: [https://www.lesswrong.com/posts/BoA3agdkAzL6HQtQP/clarifying-and-predicting-agi Clarifying and predicting AGI]
* 2024-06: Aidan McLaughlin: [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
* 2025-03: [https://arxiv.org/abs/2503.14499 Measuring AI Ability to Complete Long Tasks Measuring AI Ability to Complete Long Tasks]
** 2025-04: [https://peterwildeford.substack.com/p/forecaster-reacts-metrs-bombshell Forecaster reacts: METR's bombshell paper about AI acceleration] New data supports an exponential AI curve, but lots of uncertainty remains
** 2025-04: AI Digest: [https://theaidigest.org/time-horizons A new Moore's Law for AI agents]
[[Image:GmZHL8xWQAAtFlF.jpeg|450px]]
* 2025-04: [https://epoch.ai/blog/trends-in-ai-supercomputers Trends in AI Supercomputers] ([https://arxiv.org/abs/2504.16026 preprint])
* [https://ai-timeline.org/ The Road to AGI] (timeline visualization)
* 2025-09: [https://arxiv.org/abs/2509.09677 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs]
* 2025-09: [https://www.julian.ac/blog/2025/09/27/failing-to-understand-the-exponential-again/ Failing to Understand the Exponential, Again]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/rRbDNQLfihiHbXytf/distinguish-between-inference-scaling-and-larger-tasks-use Distinguish between inference scaling and "larger tasks use more compute"]
* 2026-03: [https://arxiv.org/abs/2603.03992 Measuring AI R&D Automation] ([https://astrangeattractor.substack.com/p/measuring-ai-r-and-d-automation?triedRedirect=true blog])

==Scaling Laws==
See: [[AI_understanding#Scaling_Laws|Scaling Laws]]

==AGI Achievable==
* Yoshua Bengio: [https://arxiv.org/abs/2310.17688 Managing extreme AI risks amid rapid progress]
* Leopold Aschenbrenner: [https://situational-awareness.ai/from-gpt-4-to-agi/#Counting_the_OOMs Situational Awareness: Counting the OOMs]
* Richard Ngo: [https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5 Visualizing the deep learning revolution]
* Katja Grace: [https://blog.aiimpacts.org/p/2023-ai-survey-of-2778-six-things Survey of 2,778 AI authors: six parts in pictures]
* Epoch AI: [https://epoch.ai/trends Machine Learning Trends]
* AI Digest: [https://theaidigest.org/progress-and-dangers How fast is AI improving?]
* 2025-06: [https://80000hours.org/agi/guide/when-will-agi-arrive/ The case for AGI by 2030]

==AGI Definition==
* 2023-11: Allan Dafoe, Shane Legg, et al.: [https://arxiv.org/abs/2311.02462 Levels of AGI for Operationalizing Progress on the Path to AGI]
* 2024-04: Bowen Xu: [https://arxiv.org/abs/2404.10731 What is Meant by AGI? On the Definition of Artificial General Intelligence]
* 2025-10: Dan Hendrycks et al.: [https://www.agidefinition.ai/paper.pdf A Definition of AGI]
* 2026-01: [https://arxiv.org/abs/2601.07364 On the universal definition of intelligence]

==Recursive Self Improvement (RSI)==
* 2026-02: [https://80000hours.org/articles/how-ai-driven-feedback-loops-could-make-things-very-crazy-very-fast/ How AI-driven feedback loops could make things very crazy, very fast]

==Progress Models==
From [http://yager-research.ca/2025/04/ai-impact-predictions/ AI Impact Predictions]:

[[Image:AI impact models-2025 11 24.png|450px]]

=Economic and Political=
* 2019-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3482150 The Impact of Artificial Intelligence on the Labor Market]
* 2020-06: [https://www.openphilanthropy.org/research/modeling-the-human-trajectory/ Modeling the Human Trajectory] (GDP)
* 2021-06: [https://www.openphilanthropy.org/research/report-on-whether-ai-could-drive-explosive-economic-growth/ Report on Whether AI Could Drive Explosive Economic Growth]
* 2023-10: Marc Andreessen: [https://a16z.com/the-techno-optimist-manifesto/ The Techno-Optimist Manifesto]
* 2023-12: [https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html My techno-optimism]: "defensive acceleration" ([https://vitalik.eth.limo/index.html Vitalik Buterin])
* 2024-03: Noah Smith: [https://www.noahpinion.blog/p/plentiful-high-paying-jobs-in-the Plentiful, high-paying jobs in the age of AI: Comparative advantage is very subtle, but incredibly powerful.] ([https://x.com/liron/status/1768013030741475485 video])
* 2024-03: [https://doi.org/10.3386/w32255 Scenarios for the Transition to AGI] (AGI leads to wage collapse)
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-06: [https://www.frbsf.org/wp-content/uploads/AI-and-Growth-Aghion-Bunel.pdf AI and Growth: Where Do We Stand?]
* 2024-09: OpenAI [https://cdn.openai.com/global-affairs/openai-infra-economics-10.09.24.pdf Infrastructure is Destiny: Economic Returns on US Investment in Democratic AI]
* 2024-12: [https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi By default, capital will matter more than ever after AGI] (L Rudolf L)
* 2025-01: [https://lukedrago.substack.com/p/the-intelligence-curse The Intelligence Curse]: With AGI, powerful actors will lose their incentives to invest in people
** Updated 2025-04: [https://intelligence-curse.ai/ The Intelligence Curse] (Luke Drago and Rudolf Laine)
*** [https://intelligence-curse.ai/pyramid/ Pyramid Replacement]
*** [https://intelligence-curse.ai/capital/ Capital, AGI, and Human Ambition]
*** [https://intelligence-curse.ai/defining/ Defining the Intelligence Curse]
*** [https://intelligence-curse.ai/shaping/ Shaping the Social Contract]
*** [https://intelligence-curse.ai/breaking/ Breaking the Intelligence Curse]
*** [https://intelligence-curse.ai/history/ History is Yours to Write]
* 2025-01: Microsoft: [https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/ The Golden Opportunity for American AI]
* 2025-01: [https://www.maximum-progress.com/p/agi-will-not-make-labor-worthless AGI Will Not Make Labor Worthless]
* 2025-01: [https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf AI in America: OpenAI's Economic Blueprint] ([https://openai.com/global-affairs/openais-economic-blueprint/ blog])
* 2025-01: [https://inferencemagazine.substack.com/p/how-much-economic-growth-from-ai How much economic growth from AI should we expect, how soon?]
* 2025-02: Morgan Stanley: [https://advisor.morganstanley.com/john.howard/documents/field/j/jo/john-howard/The_Humanoid_100_-_Mapping_the_Humanoid_Robot_Value_Chain.pdf The Humanoid 100: Mapping the Humanoid Robot Value Chain]
* 2025-02: [https://www.anthropic.com/news/the-anthropic-economic-index The Anthropic Economic Index]: [https://assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations]
* 2025-02: [https://arxiv.org/abs/2502.11264 Strategic Wealth Accumulation Under Transformative AI Expectations]
* 2025-02: Tyler Cowen: [https://marginalrevolution.com/marginalrevolution/2025/02/why-i-think-ai-take-off-is-relatively-slow.html Why I think AI take-off is relatively slow]
* 2025-03: Epoch AI: [https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d Most AI value will come from broad automation, not from R&D]
** The primary economic impact of AI will be its ability to broadly automate labor
** Automating AI R&D alone likely won’t dramatically accelerate AI progress
** Fully automating R&D requires a very broad set of abilities
** AI takeoff will likely be diffuse and salient
* 2025-03: [https://www.anthropic.com/news/anthropic-economic-index-insights-from-claude-sonnet-3-7 Anthropic Economic Index: Insights from Claude 3.7 Sonnet]
* 2025-04: [https://inferencemagazine.substack.com/p/will-there-be-extreme-inequality Will there be extreme inequality from AI?]
* 2025-04: [https://www.anthropic.com/research/impact-software-development Anthropic Economic Index: AI’s Impact on Software Development]
* 2025-05: [https://www.theguardian.com/books/2025/may/04/the-big-idea-can-we-stop-ai-making-humans-obsolete Better at everything: how AI could make human beings irrelevant]
* 2025-05: Forethought: [https://www.forethought.org/research/the-industrial-explosion The Industrial Explosion]
* 2025-05: [https://arxiv.org/abs/2505.20273 Ten Principles of AI Agent Economics]
* 2025-07: [https://substack.com/home/post/p-167879696 What Economists Get Wrong about AI] They ignore innovation effects, use outdated capability assumptions, and miss the robotics revolution
* 2025-07: [https://www.nber.org/books-and-chapters/economics-transformative-ai/we-wont-be-missed-work-and-growth-era-agi We Won't Be Missed: Work and Growth in the Era of AGI]
* 2025-07: [https://www.nber.org/papers/w34034 The Economics of Bicycles for the Mind]
* 2025-09: [https://conference.nber.org/conf_papers/f227491.pdf Genius on Demand: The Value of Transformative Artificial Intelligence]
* 2025-10: [https://peterwildeford.substack.com/p/ai-is-probably-not-a-bubble AI is probably not a bubble: AI companies have revenue, demand, and paths to immense value]
* 2025-11: [https://windowsontheory.org/2025/11/04/thoughts-by-a-non-economist-on-ai-and-economics/ Thoughts by a non-economist on AI and economics]
* 2025-11: [https://www.nber.org/papers/w34444 Artificial Intelligence, Competition, and Welfare]
* 2025-11: [https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations] (Anthropic)
* 2025-12: [https://benjamintodd.substack.com/p/how-ai-driven-feedback-loops-could How AI-driven feedback loops could make things very crazy, very fast]
* 2025-12: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth] (Philip Trammell and Leopold Aschenbrenner)
* 2026-01: [https://www.anthropic.com/research/anthropic-economic-index-january-2026-report Anthropic Economic Index: new building blocks for understanding AI use]
* 2026-01: [https://www.anthropic.com/research/economic-index-primitives Anthropic Economic Index report: economic primitives]
* 2026-02: Nate Silver: [https://www.natesilver.net/p/the-singularity-wont-be-gentle The singularity won't be gentle: If AI is even half as transformational as Silicon Valley assumes, politics will never be the same again]

==Job Loss==
* 2023-03: [https://arxiv.org/pdf/2303.10130 GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models]
** 2023-03: [https://www.livemint.com/news/world/these-jobs-are-most-at-risk-due-to-chatgpt-as-per-openai-study-11679358453267.html These jobs are most at risk due to ChatGPT, as per OpenAI study]
* 2023-08: [https://dx.doi.org/10.2139/ssrn.4527336 The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market]
** [https://x.com/jburnmurdoch/status/1722938749519077688 Freelancer sector shrinking]
[[Image:F-kVQuvWkAAemkr.png|400px]]
* 2023-09: [https://global-uploads.webflow.com/64d5f73a7fc5e8a240310c4d/650a128a34386a1206b6506c_FINAL%20Briefing%20-%20Adoption%20of%20Automation%20and%20AI%20in%20the%20UK.pdf What drives UK firms to adopt AI and robotics, and what are the consequences for jobs?]
** [https://www.digitalinformationworld.com/2023/09/78-of-companies-say-ai-created-more-jobs.html 78% of Companies Say AI Created More Jobs]
* 2023-11: [https://theaipi.org/ai-interactive-map/ New Analysis Shows Over 20% of US Jobs Significantly Exposed to AI Automation In the Near Future]
* 2024-01: [https://techcrunch.com/2024/01/09/duolingo-cut-10-of-its-contractor-workforce-as-the-company-embraces-ai/ Duolingo cuts 10% of its contractor workforce as the company embraces AI]
* 2024-02: [https://www.pwc.com/gx/en/issues/c-suite-insights/the-leadership-agenda/gen-ai-is-a-tool-for-growth-not-just-efficiency.html#:~:text=One%20out%20of%20every%20four%20of%20the%204%2C702,to%20accomplish%20the%20same%20tasks%20with%20fewer%20workers Gen AI is a tool for growth, not just efficiency: Tech CEOs are investing to build their workforce and capitalise on new opportunities from generative AI. That’s a sharp contrast to how their peers view it.]
* 2024-04: [https://www.nytimes.com/2024/04/10/business/investment-banking-jobs-artificial-intelligence.html AI is Poised to Replace the Entry-Level Grunt Work of a Wall Street Career]
* 2024-07: [https://www.wired.com/story/ai-is-already-taking-jobs-in-the-video-game-industry/ AI Is Already Taking Jobs in the Video Game Industry]: A WIRED investigation finds that major players like Activision Blizzard, which recently laid off scores of workers, are using generative AI for game development
* 2024-08: [https://www.bbc.com/news/articles/c80e1gp9m9zo Klarna: AI lets us cut thousands of jobs - but pay more]
* 2025-01: [https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/4f39375d-59c2-4c4a-b394-f3eed7858c80/content AI and Freelancers: Has the Inflection Point Arrived?]
* 2025-01: [https://www.aporiamagazine.com/p/yes-youre-going-to-be-replaced Yes, you're going to be replaced: So much cope about AI]
* 2025-03: [https://commonplace.org/2025/03/20/will-ai-automate-away-your-job/ Will AI Automate Away Your Job? The time-horizon model explains the future of the technology]
* 2025-05: [https://www.forbes.com/sites/jackkelly/2025/05/04/its-time-to-get-concerned-klarna-ups-duolingo-cisco-and-many-other-companies-are-replacing-workers-with-ai/ It’s Time To Get Concerned, Klarna, UPS, Duolingo, Cisco, And Many Other Companies Are Replacing Workers With AI]
* 2025-05: [https://time.com/7289692/when-ai-replaces-workers/ What Happens When AI Replaces Workers?]
* 2025-05: [https://www.oxfordeconomics.com/resource/educated-but-unemployed-a-rising-reality-for-us-college-grads/ Educated but unemployed, a rising reality for US college grads] Structural shifts in tech hiring and the growing impact of AI are driving higher unemployment among recent college graduates
* 2025-05: NY Times: [https://www.nytimes.com/2025/05/30/technology/ai-jobs-college-graduates.html?unlocked_article_code=1.LE8.LlC6.eT5XcpA9hxC2&smid=url-share For Some Recent Graduates, the A.I. Job Apocalypse May Already Be Here] The unemployment rate for recent college graduates has jumped as companies try to replace entry-level workers with artificial intelligence
* 2025-06: [https://80000hours.org/agi/guide/skills-ai-makes-valuable/ How not to lose your job to AI] The skills AI will make more valuable (and how to learn them)
* 2025-06: [https://arxiv.org/abs/2506.06576 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce]
[[Image:0dab4c86-882d-4095-9d12-d19684ed5184 675x680.png|300px]]
* 2025-07: Harvard Business Review: [https://hbr.org/2025/06/what-gets-measured-ai-will-automate What Gets Measured, AI Will Automate]
* 2025-08: [https://digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/ Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence]
* 2025-10: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5560401 Performance or Principle: Resistance to Artificial Intelligence in the U.S. Labor Market]
* 2025-10: [https://www.siliconcontinent.com/p/the-ai-becker-problem The AI Becker problem: Who will train the next generation?]
* 2026-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6134506 AI, Automation, and Expertise]
* 2026-02: [https://arachnemag.substack.com/p/the-jevons-paradox-for-intelligence The Jevons Paradox for Intelligence: Fears of AI-induced job loss could not be more wrong]
* 2026-03: [https://www.dropbox.com/scl/fo/689u1g785x8jp6c8v1s21/AKxZ_N15vUxMA3PBtpbr5nM?dl=0&e=1&preview=2026.03.24+Bundles.pdf&rlkey=ottgcu71u1t4mhn6tblvatu8w&st=dj6k0x2o Weak Bundle, Strong Bundle:How AI Redraws Job Boundaries]

==Productivity Impact==
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2026-02: [https://www.ft.com/content/4b51d0b4-bbfe-4f05-b50a-1d485d419dc5 The AI productivity take-off is finally visible] ([https://x.com/erikbryn/status/2023075588974735869?s=20 Erik Brynjolfsson])
** Businesses are finally beginning to reap some of AI's benefits.
* 2026-02: New York Times: [https://www.nytimes.com/2026/02/18/opinion/ai-software.html The A.I. Disruption We’ve Been Waiting for Has Arrived]

==National Security==
* 2025-04: Jeremie Harris and Edouard Harris: [https://superintelligence.gladstone.ai/ America’s Superintelligence Project]

==AI Manhattan Project==
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-10: [https://thezvi.substack.com/p/ai-88-thanks-for-the-memos?open=false#%C2%A7thanks-for-the-memos-introduction-and-competitiveness White House Memo calls for action on AI]
* 2024-11: [https://www.uscc.gov/annual-report/2024-annual-report-congress 2024 Annual Report to Congress]: [https://www.reuters.com/technology/artificial-intelligence/us-government-commission-pushes-manhattan-project-style-ai-initiative-2024-11-19/ calls] for "Manhattan Project-style" effort
* 2025-05-29: [https://x.com/ENERGY/status/1928085878561272223 DoE Tweet]: "AI is the next Manhattan Project, and THE UNITED STATES WILL WIN. 🇺🇸"
* 2025-07: [https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get How big could an “AI Manhattan Project” get?]

=Near-term=
* 2021-08: Daniel Kokotajlo: [https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like What 2026 looks like]
* 2025-02: Sam Altman: [https://blog.samaltman.com/three-observations Three Observations]
*# The intelligence of an AI model roughly equals the log of the resources used to train and run it.
*# The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use
*# The socioeconomic value of linearly increasing intelligence is super-exponential in nature
* 2025-03: [https://www.pathwaysai.org/p/glimpses-of-ai-progess Glimpses of AI Progress: Mental models for fast times]
* 2025-03: [https://www.nature.com/articles/s41598-025-92190-7 Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
** 2025-07: Video: [https://www.youtube.com/watch?v=5KVDDfAkRgc Are We 3 Years From AI Disaster? A Rigorous Forecast]
* 2025-04: Stanford HAI: [https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf Artificial Intelligence Index Report 2025]
* 2025-04: Arvind Narayananand Sayash Kapoor: [https://kfai-documents.s3.amazonaws.com/documents/c3cac5a2a7/AI-as-Normal-Technology---Narayanan---Kapoor.pdf AI as Normal Technology]
* 2025-04: Dwarkesh Patel: [https://www.dwarkesh.com/p/questions-about-ai Questions about the Future of AI]
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: IdeaFoundry: [https://ideafoundry.substack.com/p/evolution-vs-extinction-the-choice Evolution vs. Extinction: The Choice is Ours] The next 18 months will decide whether AI ends us or evolves us
* 2025-07: [https://cfg.eu/advanced-ai-possible-futures/ Advanced AI: Possible futures] Five scenarios for how the AI-transition could unfold
* 2025-11: [https://android-dreams.ai/ Android Dreams]
* 2026-02: [https://www.citriniresearch.com/ Citrini]: [https://www.citriniresearch.com/p/2028gic The 2028 Global Intelligence Crisis: A Thought Exercise in Financial History, from the Future]

==Insightful Analysis of Current State==
* 2025-11: Andy Masley: [https://andymasley.substack.com/p/the-lump-of-cognition-fallacy The lump of cognition fallacy: The extended mind as the advance of civilization]
* 2026-02: Eric Jang: [https://evjang.com/2026/02/04/rocks.html As Rocks May Think]
* 2026-02: Matt Shumer: [https://x.com/mattshumer_/status/2021256989876109403 Something Big Is Happening]
* 2026-02: Minh Pham: [https://x.com/buckeyevn/status/2014171253045960803?s=20 Why Most Agent Harnesses Are Not Bitter Lesson Pilled]

=Overall=
* 1993: [https://en.wikipedia.org/wiki/Vernor_Vinge Vernor Vinge]: [https://edoras.sdsu.edu/~vinge/misc/singularity.html The Coming Technological Singularity: How to Survive in the Post-Human Era]
* 2025-03: Kevin Roose (New York Times): [https://www.nytimes.com/2025/03/14/technology/why-im-feeling-the-agi.html?unlocked_article_code=1.304.TIEy.SmNhKYO4e9c7&smid=url-share Powerful A.I. Is Coming. We’re Not Ready.] Three arguments for taking progress toward artificial general intelligence, or A.G.I., more seriously — whether you’re an optimist or a pessimist.
* 2025-03: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/thoughts-on-future-ai.html My Thoughts on the Future of "AI"]: "I have very wide error bars on the potential future of large language models, and I think you should too."
* 2025-06: Sam Altman: [https://blog.samaltman.com/the-gentle-singularity The Gentle Singularity]

==Surveys of Opinions/Predictions==
* 2016-06: [https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/ 2016 Expert Survey on Progress in AI]
** 2023-03: [https://aiimpacts.org/scoring-forecasts-from-the-2016-expert-survey-on-progress-in-ai/ Scoring forecasts from the 2016 “Expert Survey on Progress in AI”]
* 2022-10: Forecasting Research Institute: [https://forecastingresearch.org/near-term-xpt-accuracy Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament]
** 2025-09: Ethan Mollick: [https://x.com/emollick/status/1962859757674344823 Progress is ahead of expectations]
* 2023-08: [https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2023_expert_survey_on_progress_in_ai 2023 Expert Survey on Progress in AI]
* 2024-01: [https://arxiv.org/abs/2401.02843 Thousands of AI Authors on the Future of AI]
* 2025-02: [https://arxiv.org/abs/2502.14870 Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts]
* 2025-02: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/forecasting-ai-2025-update.html AI forecasting retrospective: you're (probably) over-confident]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/long-timelines-to-advanced-ai-have "Long" timelines to advanced AI have gotten crazy short]
* 2025-05: [https://theaidigest.org/ai2025-analysis-may AI 2025 Forecasts - May Update]
* 2026-02: [https://www.nature.com/articles/s41598-026-39070-w Lay beliefs about the badness, likelihood, and importance of human extinction]

==Bad Outcomes==
* [https://pauseai.info/pdoom List of p(doom) values]
* 2019-03: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like What failure looks like]
* 2023-03: gwern: [https://gwern.net/fiction/clippy It Looks Like You’re Trying To Take Over The World]
* 2025-01: [https://arxiv.org/abs/2501.16946 Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development] ([https://gradual-disempowerment.ai/ web version])
** 2025-02: [https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment The Risk of Gradual Disempowerment from AI]
** 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-09: [https://doctrines.ai/ The three main doctrines on the future of AI]
** '''Dominance doctrine:''' First actor to create advanced AI will attain overwhelming strategic superiority
** '''Extinction doctrine:''' Humanity will lose control of ASI, leading to extinction or permanent disempowerment
** '''Replacement doctrine:''' AI will automate human tasks, but without fundamentally reshaping or ending civilization
* 2025-09: Sean ÓhÉigeartaigh: [https://www.cambridge.org/core/journals/cambridge-prisms-extinction/article/extinction-of-the-human-species-what-could-cause-it-and-how-likely-is-it-to-occur/D8816A79BEF5A4C30A3E44FD8D768622 Extinction of the human species: What could cause it and how likely is it to occur?]

==Intelligence Explosion==
* 2023-06: [https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/ What a Compute-Centric Framework Says About Takeoff Speeds]
** [https://takeoffspeeds.com/ takeoffspeeds.com simulator]
* 2025-02: [https://www.forethought.org/research/three-types-of-intelligence-explosion Three Types of Intelligence Explosion]
* 2025-03: Future of Life Institute: [https://futureoflife.org/ai/are-we-close-to-an-intelligence-explosion/ Are we close to an intelligence explosion?] AIs are inching ever-closer to a critical threshold. Beyond this threshold lie great risks—but crossing it is not inevitable.
* 2025-03: Forethought: [https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion Will AI R&D Automation Cause a Software Intelligence Explosion?]
[[Image:Gm-1jugbYAAtq Y.jpeg|450px]]
* 2025-05: [https://www.thelastinvention.ai/ The Last Invention] Why Humanity’s Final Creation Changes Everything
* 2025-08: [https://www.forethought.org/research/how-quick-and-big-would-a-software-intelligence-explosion-be How quick and big would a software intelligence explosion be?]

==Superintelligence==
* 2024-10: [http://yager-research.ca/2024/10/how-smart-will-asi-be/ How Smart will ASI be?]
* 2024-11: [http://yager-research.ca/2024/11/concise-argument-for-asi-risk/ Concise Argument for ASI Risk]
* 2025-03: [https://dynomight.net/smart/ Limits of smart]
* 2025-05: [https://timfduffy.substack.com/p/the-limits-of-superintelligence?manualredirect= The Limits of Superintelligence]

==Long-range/Philosophy==
* 2023-03: Dan Hendrycks: [https://arxiv.org/abs/2303.16200 Natural Selection Favors AIs over Humans]

=Psychology=
* 2025-01: [https://longerramblings.substack.com/p/a-defence-of-slowness-at-the-end A defence of slowness at the end of the world]

=Positives & Optimism=
==Science & Technology Improvements==
* 2023-05: [https://www.planned-obsolescence.org/author/kelsey/ Kelsey Piper]: [https://www.planned-obsolescence.org/the-costs-of-caution/ The costs of caution]
* 2024-09: Sam Altman: [https://ia.samaltman.com/ The Intelligence Age]
* 2024-10: Dario Amodei: [https://darioamodei.com/machines-of-loving-grace Machines of Loving Grace]
* 2024-11: Google DeepMind: [https://www.aipolicyperspectives.com/p/a-new-golden-age-of-discovery A new golden age of discovery]
* 2025-03: [https://finmoorhouse.com/ Fin Moorhouse], [https://www.williammacaskill.com/ Will MacAskill]: [https://www.forethought.org/research/preparing-for-the-intelligence-explosion Preparing for the Intelligence Explosion]

==Social==
* 2025-09: [https://blog.cosmos-institute.org/p/coasean-bargaining-at-scale Coasean Bargaining at Scale]: Decentralization, coordination, and co-existence with AGI
* 2025-10: [https://www.nber.org/system/files/chapters/c15309/c15309.pdf#page=15.23 The Coasean Singularity? Demand, Supply, and Market Design with AI Agents]

==Post-scarcity Society==
* 2004: Eliezer Yudkowsky (MIRI): [https://intelligence.org/files/CEV.pdf Coherent Extrapolated Volition] and [https://www.lesswrong.com/s/d3WgHDBAPYYScp5Em/p/K4aGvLnHvYgX9pZHS Fun Theory]
* 2019: John Danaher: [https://www.jstor.org/stable/j.ctvn5txpc Automation and Utopia: Human Flourishing in a World Without Work]

==The Grand Tradeoff==
* 2026-02: Nick Bostrom: [https://nickbostrom.com/optimal.pdf Optimal Timing for Superintelligence: Mundane Considerations for Existing People]

=Plans=
* [https://www.narrowpath.co/ A Narrow Path: How to Secure our Future]
* Marius Hobbhahn: [https://www.lesswrong.com/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan What’s the short timeline plan?]
* [https://cfg.eu/building-cern-for-ai/ Building CERN for AI: An institutional blueprint]
* [https://arxiv.org/abs/2503.05710 AGI, Governments, and Free Societies]
* [https://controlai.com/ Control AI]: [https://controlai.com/dip The Direct Institutional Plan]
* Luke Drago and L Rudolf L: [https://lukedrago.substack.com/p/the-use-of-knowledge-in-agi-society?triedRedirect=true The use of knowledge in (AGI) society]: How to build to break the [https://lukedrago.substack.com/p/the-intelligence-curse intelligence curse]
* [https://www.agisocialcontract.org/ AGI Social Contract]
** [https://www.agisocialcontract.org/forging-a-new-agi-social-contract Forging A New AGI Social Contract]
* Yoshua Bengio: [https://time.com/7283507/safer-ai-development/ A Potential Path to Safer AI Development]
** 2025-02: [https://arxiv.org/abs/2502.15657 Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?]
* 2026-01: Dario Amodei: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/vjAM7F8vMZS7oRrrh/how-do-we-more-safely-defer-to-ais How do we (more) safely defer to AIs?]

==Philosophy==
* [https://danfaggella.com/ Dan Faggella]:
** 2018-07: [https://danfaggella.com/moral-singularity/ Moral Singularity – Unpredictable Values Bodes Poorly for Humanity]
** 2025-02: [https://danfaggella.com/bend/ There is No Pause – We Must Bend the Posthuman Trajectory]
* Joe Carlsmith: 2024: [https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi Otherness and control in the age of AGI]
*# [https://joecarlsmith.com/2024/01/02/gentleness-and-the-artificial-other Gentleness and the artificial Other]
*# [https://joecarlsmith.com/2024/01/04/deep-atheism-and-ai-risk Deep atheism and AI risk]
*# [https://joecarlsmith.com/2024/01/08/when-yang-goes-wrong When “yang” goes wrong]
*# [https://joecarlsmith.com/2024/01/09/does-ai-risk-other-the-ais Does AI risk “other” the AIs?]
*# [https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism An even deeper atheism]
*# [https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy Being nicer than Clippy]
*# [https://joecarlsmith.com/2024/01/18/on-the-abolition-of-man On the abolition of man]
*# [https://joecarlsmith.com/2024/03/21/on-green On green]
*# [https://joecarlsmith.com/2024/03/25/on-attunement On attunement]
*# [https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust Loving a world you don’t trust]
* Anthony Aguirre:
** [https://x.com/AnthonyNAguirre/status/1898023049930457468 2025-03]: [https://keepthefuturehuman.ai/ Keep The Future Human]
[[Image:GlchEeObwAQ88NK.jpeg|300px]]
* 2025-04: Scott Alexander (Astral Codex Ten): [https://www.astralcodexten.com/p/the-colors-of-her-coat The Colors Of Her Coat] (response to [https://www.theintrinsicperspective.com/p/welcome-to-the-semantic-apocalypse semantic apocalypse] and semantic satiation)
* 2025-05: Helen Toner: [https://www.ai-frontiers.org/articles/were-arguing-about-ai-safety-wrong We’re Arguing About AI Safety Wrong]: Dynamism vs. stasis is a clearer lens for criticizing controversial AI safety prescriptions
* 2025-05: Joe Carlsmith: [https://joecarlsmith.substack.com/p/the-stakes-of-ai-moral-status The stakes of AI moral status]

==Research==
* 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]

==Alignment==
* 2023-03: Leopold Aschenbrenner: [https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/ Nobody’s on the ball on AGI alignment]
* 2024-03: [https://arxiv.org/abs/2404.10636 What are human values, and how do we align AI to them?] ([https://meaningalignment.substack.com/p/0480e023-98c0-4633-a604-990d3ac880ac blog])
* 2025: Joe Carlsmith: [https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem How do we solve the alignment problem?] Introduction to an essay series on paths to safe, useful superintelligence
*# [https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment What is it to solve the alignment problem?] Also: to avoid it? Handle it? Solve it forever? Solve it completely? ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16617671-what-is-it-to-solve-the-alignment-problem audio version])
*# [https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power When should we worry about AI power-seeking?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16651469-when-should-we-worry-about-ai-power-seeking audio version])
*# [https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety Paths and waystations in AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16768804-paths-and-waystations-in-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/ai-for-ai-safety AI for AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16790183-ai-for-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/can-we-safely-automate-alignment Can we safely automate alignment research?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17069901-can-we-safely-automate-alignment-research audio version], [https://joecarlsmith.substack.com/p/video-and-transcript-of-talk-on-automating?utm_source=post-email-title&publication_id=1022275&post_id=162375391&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email video version])
*# [https://joecarlsmith.substack.com/p/giving-ais-safe-motivations?utm_source=post-email-title&publication_id=1022275&post_id=171250683&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email Giving AIs safe motivations] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17686921-giving-ais-safe-motivations audio version])
*# [https://joecarlsmith.com/2025/09/29/controlling-the-options-ais-can-pursue Controlling the options AIs can pursue] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17909401-controlling-the-options-ais-can-pursue audio version])
*# [https://joecarlsmith.substack.com/p/how-human-like-do-safe-ai-motivations?utm_source=post-email-title&publication_id=1022275&post_id=178666988&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email How human-like do safe AI motivations need to be?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18175429-how-human-like-do-safe-ai-motivations-need-to-be audio version])
*# [https://joecarlsmith.substack.com/p/building-ais-that-do-human-like-philosophy Building AIs that do human-like philosophy: AIs will face philosophical questions humans can't answer for them] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18591342-building-ais-that-do-human-like-philosophy audio version])
*# [https://joecarlsmith.substack.com/p/on-restraining-ai-development-for?utm_source=post-email-title&publication_id=1022275&post_id=191385185&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email On restraining AI development for the sake of safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18869440-on-restraining-ai-development-for-the-sake-of-safety audio version])
* 2025-04: Dario Amodei: [https://www.darioamodei.com/post/the-urgency-of-interpretability The Urgency of Interpretability]

==Strategic/Technical==
* 2025-03: [https://resilience.baulab.info/docs/AI_Action_Plan_RFI.pdf AI Dominance Requires Interpretability and Standards for Transparency and Security]
* 2026-02: [https://www.gap-map.org/capabilities/?sort=bottlenecks Fundamental Development Gap Map v1.0]

==Strategic/Policy==
* 2015-03: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-2 Machine intelligence, part 2]
* 2019-07: Amanda Askell, Miles Brundage, Gillian Hadfield: [https://arxiv.org/abs/1907.04534 The Role of Cooperation in Responsible AI Development]
* 2025-03: Dan Hendrycks, Eric Schmidt, Alexandr Wang: [https://www.nationalsecurity.ai/ Superintelligence Strategy]
** [https://www.nationalsecurity.ai/chapter/executive-summary Executive Summary]
** [https://www.nationalsecurity.ai/chapter/introduction Introduction]
** [https://www.nationalsecurity.ai/chapter/ai-is-pivotal-for-national-security AI Is Pivotal for National Security]
** [https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim Deterrence with Mutual Assured AI Malfunction (MAIM)]
** [https://www.nationalsecurity.ai/chapter/nonproliferation Nonproliferation]
** [https://www.nationalsecurity.ai/chapter/competitiveness Competitiveness]
** [https://www.nationalsecurity.ai/chapter/conclusion Conclusion]
** [https://www.nationalsecurity.ai/chapter/appendix Appendix FAQs]
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human] ([https://keepthefuturehuman.ai/essay/ essay])
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late (video)] (2025)
**# Oversight: Registration required for training >1025 FLOP and inference >1019 FLOP/s (~1,000 B200 GPUs @ $25M). Build cryptographic licensing into hardware.
**# Computation Limits: Ban on training models >1027 FLOP or inference >1020 FLOP/s.
**# Strict Liability: Hold AI companies responsible for outcomes.
**# Tiered Regulation: Low regulation on tool-AI, strictest regulation on AGI (general, capable, autonomous systems).
* 2025-04: [https://x.com/deanwball Dean W. Ball]: [https://arxiv.org/abs/2504.11501 A Framework for the Private Governance of Frontier Artificial Intelligence]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/nonproliferation-is-the-wrong-approach?source=queue Nonproliferation is the wrong approach to AI misuse]
* 2025-04: MIRI: [https://techgov.intelligence.org/research/ai-governance-to-avoid-extinction AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions]
* 2025-05: [https://writing.antonleicht.me/p/the-new-ai-policy-frontier The New AI Policy Frontier]: Beyond the shortcomings of centralised control and alignment, a new school of thought on AI governance emerges. It still faces tricky politics.
* 2025-05: [https://uncpga.world/agi-uncpga-report/ AGI UNCPGA Report]: Governance of the Transition to Artificial General Intelligence (AGI) Urgent Considerations for the UN General Assembly: Report for the Council of Presidents of the United Nations General Assembly (UNCPGA)
* 2025-06: [https://writing.antonleicht.me/p/ai-and-jobs-politics-without-policy AI & Jobs: Politics without Policy] Political support mounts - for a policy platform that does not yet exist
* 2025-06: [https://x.com/littIeramblings Sarah Hastings-Woodhouse]: [https://drive.google.com/file/d/1mmdHBE6M2yiyL21-ctTuRLNH5xOFjqWm/view Safety Features for a Centralized AGI Project]
* 2025-07: [https://writing.antonleicht.me/p/a-moving-target A Moving Target] Why we might not be quite ready to comprehensively regulate AI, and why it matters
* 2025-07: [https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf Anthropic: Build AI in America] ([https://www.anthropic.com/news/build-ai-in-america blog])
* 2025-12: [https://asi-prevention.com/ How middle powers may prevent the development of artificial superintelligence]
* 2026-03: [https://humanstatement.org/ The Pro-Human AI Declaration]

==Restriction==
* 2024-05: OpenAI: [https://openai.com/index/reimagining-secure-infrastructure-for-advanced-ai/ Reimagining secure infrastructure for advanced AI] OpenAI calls for an evolution in infrastructure security to protect advanced AI
* 2025-07: MIRI: [https://arxiv.org/abs/2507.09801 Technical Requirements for Halting Dangerous AI Activities]

=See Also=
* [[AI safety]]

Science Agents

2026-03-24T15:40:38Z

KevinYager: /* Science Agentic Components */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Personalities==
* 2026-03: [https://github.com/msitarzewski/agency-agents The Agency: AI Specialists Ready to Transform Your Workflow]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

Human brain

2026-03-23T17:33:56Z

KevinYager: /* Understanding */

=Why brain is as it is=
* 2025-06: [https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00319-X The metabolic costs of cognition]

=How Brain Works=
==Predictive Coding==
* 2005-04: [https://royalsocietypublishing.org/doi/10.1098/rstb.2005.1622?utm_source=chatgpt.com A theory of cortical responses]
* 2014-09: [https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2014.00666/full Visual mismatch negativity: a predictive coding view]
* 2015-01: [https://www.sciencedirect.com/science/article/pii/S089662731401099X Visual Areas Exert Feedforward and Feedback Influences through Distinct Frequency Channels]
* 2016-11: [https://www.sciencedirect.com/science/article/pii/S0896627316306997 Mismatch Receptive Fields in Mouse Visual Cortex]
* 2018-03: [https://www.nature.com/articles/s41598-018-21407-9 Frontal cortex function as derived from hierarchical predictive coding]
* 2024-02: [https://www.sciencedirect.com/science/article/pii/S0149763423004426 The empirical status of predictive coding and active inference]

=Understanding=
* [https://arxiv.org/abs/2501.02950 Key-value memory in the brain]
* [https://helper.ipam.ucla.edu/publications/mac2024/mac2024_20152.pdf The cost of brain state transitions]

==Brain mapping==
* 2024-05: [https://www.science.org/doi/10.1126/science.adk4858 A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution] ([https://www.nature.com/articles/d41586-024-01387-9#ref-CR1 media summary])
* 2024-10: [https://www.nature.com/articles/s41586-024-07558-y Neuronal wiring diagram of an adult brain] ([https://www.nytimes.com/2024/10/02/science/fruit-fly-brain-mapped.html media summary]); 140,000 neurons in fruit fly brain
* 2024-12: [https://e11.bio/news/roadmap A roadmap to scale connectomics to entire mammalian brains]
* 2025-04: [https://www.nature.com/articles/s41586-025-08840-3 Functional connectomics reveals general wiring rule in mouse visual cortex] ([https://www.nature.com/articles/d41586-025-01088-x?utm_source=x&utm_medium=social&utm_campaign=nature&linkId=13897098 media summary])
* 2025-08: [https://www.nature.com/articles/s41586-025-08985-1 Light-microscopy-based connectomic reconstruction of mammalian brain tissue] ([https://research.google/blog/a-new-light-on-neural-connections/ blog])

===Related===
* [https://v2.virtualflybrain.org 3D visualization of adult fruit fly brain]

==Brain signal decoding==
* 2022-11: [https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2.full.pdf High-resolution image reconstruction with latent diffusion models from human brain activity]
* 2023-08: [https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002176%20 Music can be reconstructed from human auditory cortex activity using nonlinear decoding models] (intracranial EEG)
* 2023-09: [https://arxiv.org/abs/2309.14030 DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation] (external EEG)
* 2023-09: [https://www.biorxiv.org/content/10.1101/2023.09.12.557460v1 BrainLM: A foundation model for brain activity recordings]
* 2023-10: [https://ai.meta.com/blog/brain-ai-image-decoding-meg-magnetoencephalography/ Toward a real-time decoding of images from brain activity] (MEG)
* 2024-06: [https://www.biorxiv.org/content/10.1101/2024.06.04.596589v1.full.pdf PAM: Predictive Attention Mechanism for Neural Decoding of Visual Perception]
* 2024-07: [https://arxiv.org/abs/2407.07595 Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data] (EEG)
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-01: [https://arxiv.org/abs/2501.15322v2 Scaling laws for decoding images from brain activity] (EEG)
* 2025-02: Meta: [https://ai.meta.com/research/publications/brain-to-text-decoding-a-non-invasive-approach-via-typing/ Brain-to-Text Decoding: A Non-invasive Approach via Typing]
* 2025-02: Meta: [https://ai.meta.com/research/publications/from-thought-to-action-how-a-hierarchy-of-neural-dynamics-supports-language-production/ From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
* 2025-03: [https://www.nature.com/articles/s41593-025-01905-6 A streaming brain-to-voice neuroprosthesis to restore naturalistic communication]
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-09: [https://arxiv.org/abs/2508.18226 Disentangling the Factors of Convergence between Brains and Computer Vision Models] (fMRI and MEG)

==Whole Brain Emulation (WBE)==
* 2024-09: [https://www.nature.com/articles/s41586-024-07939-3 Connectome-constrained networks predict neural activity across the fly visual system]
* 2025-10: [https://arxiv.org/abs/2510.15745 State of Brain Emulation Report 2025]

=Computational Analysis=

==Computational power of human brain==
* 2020-09: Joe Carlsmith: [https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/ How Much Computational Power Does It Take to Match the Human Brain?]

==Comparison to computer==
* [https://arxiv.org/abs/2208.12032 How (and Why) to Think that the Brain is Literally a Computer]
* [https://www.nature.com/articles/s42256-024-00925-4 Contextual feature extraction hierarchies converge in large language models and the brain] ([https://techxplore.com/news/2024-12-llms-brain-advance.html LLMs are becoming more brain-like as they advance])

==Biological vs. artificial neuron==
* [https://www.sciencedirect.com/science/article/pii/S0896627321005018 Single cortical neurons as deep artificial neural networks]: Each biological neuron can be simulated using DNN of 5-8 layers
* [https://arxiv.org/abs/2305.12471 Mapping Biological Neuron Dynamics into an Interpretable Two-layer Artificial Neural Network]

==Data processing==
* [https://pmc.ncbi.nlm.nih.gov/articles/PMC1564115/ How Much the Eye Tells the Brain]
* [https://www.sciencedirect.com/science/article/pii/S1364661313001277 Representational geometry: integrating cognition, computation, and the brain]
* [https://www.nature.com/articles/s41586-024-07522-w Language is primarily a tool for communication rather than thought]
* [https://www.openread.academy/en/paper/reading?corpusId=513306465 The Unbearable Slowness of Being: Why do we live at 10 bits/s?] ([https://arxiv.org/abs/2408.10234 preprint])

==Extract manifold/geometry==
* [https://www.science.org/doi/10.1126/science.adk8261 Selection of experience for memory by hippocampal sharp wave ripples]

=Comparisons=
* 2023-08: [https://arxiv.org/abs/2308.08708 Consciousness in Artificial Intelligence: Insights from the Science of Consciousness]
* 2024-05: [https://arxiv.org/abs/2405.02325 Are Biological Systems More Intelligent Than Artificial Intelligence?]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
** 2022-03: [https://www.nature.com/articles/s41593-022-01026-4 Shared computational principles for language processing in humans and deep language models]
** 2024-03: [https://www.nature.com/articles/s41467-024-46631-y Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns]
** 2025-03: [https://www.nature.com/articles/s41562-025-02105-9 A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations]
* 2025-05: [https://ai.meta.com/research/publications/emergence-of-language-in-the-developing-brain/ Emergence of Language in the Developing Brain]

==Analogies==
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-12: [https://www.nature.com/articles/s41562-025-02359-3 Shared sensitivity to data distribution during learning in humans and transformer networks]
===Speed-accuracy trade-off vs. Inference-compute===
* 2007: [https://psycnet.apa.org/doi/10.1037/0096-3445.136.2.217 Focusing the spotlight: individual differences in visual attention control]
* 2014-07: [https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2014.00150/full The speed-accuracy tradeoff: history, physiology, methodology, and behavior]

=Simulate Brain=
* 2023-09: [https://spj.science.org/doi/10.34133/icomputing.0055 The Digital Twin Brain: A Bridge between Biological and Artificial Intelligence]
* 2024-12: [https://www.nature.com/articles/s43588-024-00731-3 Simulation and assimilation of the digital human brain] ([https://arxiv.org/abs/2211.15963 preprint], [https://github.com/DTB-consortium/Digital_twin_brain-open code])
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-08: [https://www.arxiv.org/abs/2507.22229 TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction]

==See Also==
* [[AI_and_Humans#Simulate_Humans|Simulate Humans (using LLM)]]

=Bio-brain Inspirations for AI=
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]

=Theories of Consciousness=
* [https://www.consciousnessatlas.com/ Consciousness Atlas]

=See Also=
* [[AI_and_Humans#Simulate_Humans|LLM Simulate Humans]]

Human brain

2026-03-23T17:32:02Z

KevinYager: /* Why Brain is at it is */

=Why brain is as it is=
* 2025-06: [https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00319-X The metabolic costs of cognition]

=How Brain Works=
==Predictive Coding==
* 2005-04: [https://royalsocietypublishing.org/doi/10.1098/rstb.2005.1622?utm_source=chatgpt.com A theory of cortical responses]
* 2014-09: [https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2014.00666/full Visual mismatch negativity: a predictive coding view]
* 2015-01: [https://www.sciencedirect.com/science/article/pii/S089662731401099X Visual Areas Exert Feedforward and Feedback Influences through Distinct Frequency Channels]
* 2016-11: [https://www.sciencedirect.com/science/article/pii/S0896627316306997 Mismatch Receptive Fields in Mouse Visual Cortex]
* 2018-03: [https://www.nature.com/articles/s41598-018-21407-9 Frontal cortex function as derived from hierarchical predictive coding]
* 2024-02: [https://www.sciencedirect.com/science/article/pii/S0149763423004426 The empirical status of predictive coding and active inference]

=Understanding=
* [https://arxiv.org/abs/2501.02950 Key-value memory in the brain]
* [https://helper.ipam.ucla.edu/publications/mac2024/mac2024_20152.pdf The cost of brain state transitions]

==Brain mapping==
* 2024-05: [https://www.science.org/doi/10.1126/science.adk4858 A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution] ([https://www.nature.com/articles/d41586-024-01387-9#ref-CR1 media summary])
* 2024-10: [https://www.nature.com/articles/s41586-024-07558-y Neuronal wiring diagram of an adult brain] ([https://www.nytimes.com/2024/10/02/science/fruit-fly-brain-mapped.html media summary]); 140,000 neurons in fruit fly brain
* 2024-12: [https://e11.bio/news/roadmap A roadmap to scale connectomics to entire mammalian brains]
* 2025-04: [https://www.nature.com/articles/s41586-025-08840-3 Functional connectomics reveals general wiring rule in mouse visual cortex] ([https://www.nature.com/articles/d41586-025-01088-x?utm_source=x&utm_medium=social&utm_campaign=nature&linkId=13897098 media summary])
* 2025-08: [https://www.nature.com/articles/s41586-025-08985-1 Light-microscopy-based connectomic reconstruction of mammalian brain tissue] ([https://research.google/blog/a-new-light-on-neural-connections/ blog])

===Related===
* [https://v2.virtualflybrain.org 3D visualization of adult fruit fly brain]

==Brain signal decoding==
* 2022-11: [https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2.full.pdf High-resolution image reconstruction with latent diffusion models from human brain activity]
* 2023-08: [https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002176%20 Music can be reconstructed from human auditory cortex activity using nonlinear decoding models] (intracranial EEG)
* 2023-09: [https://arxiv.org/abs/2309.14030 DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation] (external EEG)
* 2023-09: [https://www.biorxiv.org/content/10.1101/2023.09.12.557460v1 BrainLM: A foundation model for brain activity recordings]
* 2023-10: [https://ai.meta.com/blog/brain-ai-image-decoding-meg-magnetoencephalography/ Toward a real-time decoding of images from brain activity] (MEG)
* 2024-06: [https://www.biorxiv.org/content/10.1101/2024.06.04.596589v1.full.pdf PAM: Predictive Attention Mechanism for Neural Decoding of Visual Perception]
* 2024-07: [https://arxiv.org/abs/2407.07595 Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data] (EEG)
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-01: [https://arxiv.org/abs/2501.15322v2 Scaling laws for decoding images from brain activity] (EEG)
* 2025-02: Meta: [https://ai.meta.com/research/publications/brain-to-text-decoding-a-non-invasive-approach-via-typing/ Brain-to-Text Decoding: A Non-invasive Approach via Typing]
* 2025-02: Meta: [https://ai.meta.com/research/publications/from-thought-to-action-how-a-hierarchy-of-neural-dynamics-supports-language-production/ From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
* 2025-03: [https://www.nature.com/articles/s41593-025-01905-6 A streaming brain-to-voice neuroprosthesis to restore naturalistic communication]
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-09: [https://arxiv.org/abs/2508.18226 Disentangling the Factors of Convergence between Brains and Computer Vision Models] (fMRI and MEG)

=Computational Analysis=

==Computational power of human brain==
* 2020-09: Joe Carlsmith: [https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/ How Much Computational Power Does It Take to Match the Human Brain?]

==Comparison to computer==
* [https://arxiv.org/abs/2208.12032 How (and Why) to Think that the Brain is Literally a Computer]
* [https://www.nature.com/articles/s42256-024-00925-4 Contextual feature extraction hierarchies converge in large language models and the brain] ([https://techxplore.com/news/2024-12-llms-brain-advance.html LLMs are becoming more brain-like as they advance])

==Biological vs. artificial neuron==
* [https://www.sciencedirect.com/science/article/pii/S0896627321005018 Single cortical neurons as deep artificial neural networks]: Each biological neuron can be simulated using DNN of 5-8 layers
* [https://arxiv.org/abs/2305.12471 Mapping Biological Neuron Dynamics into an Interpretable Two-layer Artificial Neural Network]

==Data processing==
* [https://pmc.ncbi.nlm.nih.gov/articles/PMC1564115/ How Much the Eye Tells the Brain]
* [https://www.sciencedirect.com/science/article/pii/S1364661313001277 Representational geometry: integrating cognition, computation, and the brain]
* [https://www.nature.com/articles/s41586-024-07522-w Language is primarily a tool for communication rather than thought]
* [https://www.openread.academy/en/paper/reading?corpusId=513306465 The Unbearable Slowness of Being: Why do we live at 10 bits/s?] ([https://arxiv.org/abs/2408.10234 preprint])

==Extract manifold/geometry==
* [https://www.science.org/doi/10.1126/science.adk8261 Selection of experience for memory by hippocampal sharp wave ripples]

=Comparisons=
* 2023-08: [https://arxiv.org/abs/2308.08708 Consciousness in Artificial Intelligence: Insights from the Science of Consciousness]
* 2024-05: [https://arxiv.org/abs/2405.02325 Are Biological Systems More Intelligent Than Artificial Intelligence?]
* 2025-03: Google: [https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/ Deciphering language processing in the human brain through LLM representations]
** 2022-03: [https://www.nature.com/articles/s41593-022-01026-4 Shared computational principles for language processing in humans and deep language models]
** 2024-03: [https://www.nature.com/articles/s41467-024-46631-y Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns]
** 2025-03: [https://www.nature.com/articles/s41562-025-02105-9 A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations]
* 2025-05: [https://ai.meta.com/research/publications/emergence-of-language-in-the-developing-brain/ Emergence of Language in the Developing Brain]

==Analogies==
* 2025-08: [https://arxiv.org/abs/2508.11536 Language models align with brain regions that represent concepts across modalities]
* 2025-12: [https://www.nature.com/articles/s41562-025-02359-3 Shared sensitivity to data distribution during learning in humans and transformer networks]
===Speed-accuracy trade-off vs. Inference-compute===
* 2007: [https://psycnet.apa.org/doi/10.1037/0096-3445.136.2.217 Focusing the spotlight: individual differences in visual attention control]
* 2014-07: [https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2014.00150/full The speed-accuracy tradeoff: history, physiology, methodology, and behavior]

=Simulate Brain=
* 2023-09: [https://spj.science.org/doi/10.34133/icomputing.0055 The Digital Twin Brain: A Bridge between Biological and Artificial Intelligence]
* 2024-12: [https://www.nature.com/articles/s43588-024-00731-3 Simulation and assimilation of the digital human brain] ([https://arxiv.org/abs/2211.15963 preprint], [https://github.com/DTB-consortium/Digital_twin_brain-open code])
* 2024-12: [https://arxiv.org/abs/2412.19814 Predicting Human Brain States with Transformer]
* 2025-08: [https://www.arxiv.org/abs/2507.22229 TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction]

==See Also==
* [[AI_and_Humans#Simulate_Humans|Simulate Humans (using LLM)]]

=Bio-brain Inspirations for AI=
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]

=Theories of Consciousness=
* [https://www.consciousnessatlas.com/ Consciousness Atlas]

=See Also=
* [[AI_and_Humans#Simulate_Humans|LLM Simulate Humans]]

AI predictions

2026-03-23T17:29:59Z

KevinYager: /* Alignment */

=Capability Scaling=
* 2019-03: Rich Sutton: [https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf The Bitter Lesson]
* 2020-09: Ajeya Cotra: [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines Draft report on AI timelines]
* 2022-01: gwern: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis]
* 2023-05: Richard Ngo: [https://www.lesswrong.com/posts/BoA3agdkAzL6HQtQP/clarifying-and-predicting-agi Clarifying and predicting AGI]
* 2024-06: Aidan McLaughlin: [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
* 2025-03: [https://arxiv.org/abs/2503.14499 Measuring AI Ability to Complete Long Tasks Measuring AI Ability to Complete Long Tasks]
** 2025-04: [https://peterwildeford.substack.com/p/forecaster-reacts-metrs-bombshell Forecaster reacts: METR's bombshell paper about AI acceleration] New data supports an exponential AI curve, but lots of uncertainty remains
** 2025-04: AI Digest: [https://theaidigest.org/time-horizons A new Moore's Law for AI agents]
[[Image:GmZHL8xWQAAtFlF.jpeg|450px]]
* 2025-04: [https://epoch.ai/blog/trends-in-ai-supercomputers Trends in AI Supercomputers] ([https://arxiv.org/abs/2504.16026 preprint])
* [https://ai-timeline.org/ The Road to AGI] (timeline visualization)
* 2025-09: [https://arxiv.org/abs/2509.09677 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs]
* 2025-09: [https://www.julian.ac/blog/2025/09/27/failing-to-understand-the-exponential-again/ Failing to Understand the Exponential, Again]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/rRbDNQLfihiHbXytf/distinguish-between-inference-scaling-and-larger-tasks-use Distinguish between inference scaling and "larger tasks use more compute"]
* 2026-03: [https://arxiv.org/abs/2603.03992 Measuring AI R&D Automation] ([https://astrangeattractor.substack.com/p/measuring-ai-r-and-d-automation?triedRedirect=true blog])

==Scaling Laws==
See: [[AI_understanding#Scaling_Laws|Scaling Laws]]

==AGI Achievable==
* Yoshua Bengio: [https://arxiv.org/abs/2310.17688 Managing extreme AI risks amid rapid progress]
* Leopold Aschenbrenner: [https://situational-awareness.ai/from-gpt-4-to-agi/#Counting_the_OOMs Situational Awareness: Counting the OOMs]
* Richard Ngo: [https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5 Visualizing the deep learning revolution]
* Katja Grace: [https://blog.aiimpacts.org/p/2023-ai-survey-of-2778-six-things Survey of 2,778 AI authors: six parts in pictures]
* Epoch AI: [https://epoch.ai/trends Machine Learning Trends]
* AI Digest: [https://theaidigest.org/progress-and-dangers How fast is AI improving?]
* 2025-06: [https://80000hours.org/agi/guide/when-will-agi-arrive/ The case for AGI by 2030]

==AGI Definition==
* 2023-11: Allan Dafoe, Shane Legg, et al.: [https://arxiv.org/abs/2311.02462 Levels of AGI for Operationalizing Progress on the Path to AGI]
* 2024-04: Bowen Xu: [https://arxiv.org/abs/2404.10731 What is Meant by AGI? On the Definition of Artificial General Intelligence]
* 2025-10: Dan Hendrycks et al.: [https://www.agidefinition.ai/paper.pdf A Definition of AGI]
* 2026-01: [https://arxiv.org/abs/2601.07364 On the universal definition of intelligence]

==Recursive Self Improvement (RSI)==
* 2026-02: [https://80000hours.org/articles/how-ai-driven-feedback-loops-could-make-things-very-crazy-very-fast/ How AI-driven feedback loops could make things very crazy, very fast]

==Progress Models==
From [http://yager-research.ca/2025/04/ai-impact-predictions/ AI Impact Predictions]:

[[Image:AI impact models-2025 11 24.png|450px]]

=Economic and Political=
* 2019-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3482150 The Impact of Artificial Intelligence on the Labor Market]
* 2020-06: [https://www.openphilanthropy.org/research/modeling-the-human-trajectory/ Modeling the Human Trajectory] (GDP)
* 2021-06: [https://www.openphilanthropy.org/research/report-on-whether-ai-could-drive-explosive-economic-growth/ Report on Whether AI Could Drive Explosive Economic Growth]
* 2023-10: Marc Andreessen: [https://a16z.com/the-techno-optimist-manifesto/ The Techno-Optimist Manifesto]
* 2023-12: [https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html My techno-optimism]: "defensive acceleration" ([https://vitalik.eth.limo/index.html Vitalik Buterin])
* 2024-03: Noah Smith: [https://www.noahpinion.blog/p/plentiful-high-paying-jobs-in-the Plentiful, high-paying jobs in the age of AI: Comparative advantage is very subtle, but incredibly powerful.] ([https://x.com/liron/status/1768013030741475485 video])
* 2024-03: [https://doi.org/10.3386/w32255 Scenarios for the Transition to AGI] (AGI leads to wage collapse)
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-06: [https://www.frbsf.org/wp-content/uploads/AI-and-Growth-Aghion-Bunel.pdf AI and Growth: Where Do We Stand?]
* 2024-09: OpenAI [https://cdn.openai.com/global-affairs/openai-infra-economics-10.09.24.pdf Infrastructure is Destiny: Economic Returns on US Investment in Democratic AI]
* 2024-12: [https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi By default, capital will matter more than ever after AGI] (L Rudolf L)
* 2025-01: [https://lukedrago.substack.com/p/the-intelligence-curse The Intelligence Curse]: With AGI, powerful actors will lose their incentives to invest in people
** Updated 2025-04: [https://intelligence-curse.ai/ The Intelligence Curse] (Luke Drago and Rudolf Laine)
*** [https://intelligence-curse.ai/pyramid/ Pyramid Replacement]
*** [https://intelligence-curse.ai/capital/ Capital, AGI, and Human Ambition]
*** [https://intelligence-curse.ai/defining/ Defining the Intelligence Curse]
*** [https://intelligence-curse.ai/shaping/ Shaping the Social Contract]
*** [https://intelligence-curse.ai/breaking/ Breaking the Intelligence Curse]
*** [https://intelligence-curse.ai/history/ History is Yours to Write]
* 2025-01: Microsoft: [https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/ The Golden Opportunity for American AI]
* 2025-01: [https://www.maximum-progress.com/p/agi-will-not-make-labor-worthless AGI Will Not Make Labor Worthless]
* 2025-01: [https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf AI in America: OpenAI's Economic Blueprint] ([https://openai.com/global-affairs/openais-economic-blueprint/ blog])
* 2025-01: [https://inferencemagazine.substack.com/p/how-much-economic-growth-from-ai How much economic growth from AI should we expect, how soon?]
* 2025-02: Morgan Stanley: [https://advisor.morganstanley.com/john.howard/documents/field/j/jo/john-howard/The_Humanoid_100_-_Mapping_the_Humanoid_Robot_Value_Chain.pdf The Humanoid 100: Mapping the Humanoid Robot Value Chain]
* 2025-02: [https://www.anthropic.com/news/the-anthropic-economic-index The Anthropic Economic Index]: [https://assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations]
* 2025-02: [https://arxiv.org/abs/2502.11264 Strategic Wealth Accumulation Under Transformative AI Expectations]
* 2025-02: Tyler Cowen: [https://marginalrevolution.com/marginalrevolution/2025/02/why-i-think-ai-take-off-is-relatively-slow.html Why I think AI take-off is relatively slow]
* 2025-03: Epoch AI: [https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d Most AI value will come from broad automation, not from R&D]
** The primary economic impact of AI will be its ability to broadly automate labor
** Automating AI R&D alone likely won’t dramatically accelerate AI progress
** Fully automating R&D requires a very broad set of abilities
** AI takeoff will likely be diffuse and salient
* 2025-03: [https://www.anthropic.com/news/anthropic-economic-index-insights-from-claude-sonnet-3-7 Anthropic Economic Index: Insights from Claude 3.7 Sonnet]
* 2025-04: [https://inferencemagazine.substack.com/p/will-there-be-extreme-inequality Will there be extreme inequality from AI?]
* 2025-04: [https://www.anthropic.com/research/impact-software-development Anthropic Economic Index: AI’s Impact on Software Development]
* 2025-05: [https://www.theguardian.com/books/2025/may/04/the-big-idea-can-we-stop-ai-making-humans-obsolete Better at everything: how AI could make human beings irrelevant]
* 2025-05: Forethought: [https://www.forethought.org/research/the-industrial-explosion The Industrial Explosion]
* 2025-05: [https://arxiv.org/abs/2505.20273 Ten Principles of AI Agent Economics]
* 2025-07: [https://substack.com/home/post/p-167879696 What Economists Get Wrong about AI] They ignore innovation effects, use outdated capability assumptions, and miss the robotics revolution
* 2025-07: [https://www.nber.org/books-and-chapters/economics-transformative-ai/we-wont-be-missed-work-and-growth-era-agi We Won't Be Missed: Work and Growth in the Era of AGI]
* 2025-07: [https://www.nber.org/papers/w34034 The Economics of Bicycles for the Mind]
* 2025-09: [https://conference.nber.org/conf_papers/f227491.pdf Genius on Demand: The Value of Transformative Artificial Intelligence]
* 2025-10: [https://peterwildeford.substack.com/p/ai-is-probably-not-a-bubble AI is probably not a bubble: AI companies have revenue, demand, and paths to immense value]
* 2025-11: [https://windowsontheory.org/2025/11/04/thoughts-by-a-non-economist-on-ai-and-economics/ Thoughts by a non-economist on AI and economics]
* 2025-11: [https://www.nber.org/papers/w34444 Artificial Intelligence, Competition, and Welfare]
* 2025-11: [https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations] (Anthropic)
* 2025-12: [https://benjamintodd.substack.com/p/how-ai-driven-feedback-loops-could How AI-driven feedback loops could make things very crazy, very fast]
* 2025-12: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth] (Philip Trammell and Leopold Aschenbrenner)
* 2026-01: [https://www.anthropic.com/research/anthropic-economic-index-january-2026-report Anthropic Economic Index: new building blocks for understanding AI use]
* 2026-01: [https://www.anthropic.com/research/economic-index-primitives Anthropic Economic Index report: economic primitives]
* 2026-02: Nate Silver: [https://www.natesilver.net/p/the-singularity-wont-be-gentle The singularity won't be gentle: If AI is even half as transformational as Silicon Valley assumes, politics will never be the same again]

==Job Loss==
* 2023-03: [https://arxiv.org/pdf/2303.10130 GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models]
** 2023-03: [https://www.livemint.com/news/world/these-jobs-are-most-at-risk-due-to-chatgpt-as-per-openai-study-11679358453267.html These jobs are most at risk due to ChatGPT, as per OpenAI study]
* 2023-08: [https://dx.doi.org/10.2139/ssrn.4527336 The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market]
** [https://x.com/jburnmurdoch/status/1722938749519077688 Freelancer sector shrinking]
[[Image:F-kVQuvWkAAemkr.png|400px]]
* 2023-09: [https://global-uploads.webflow.com/64d5f73a7fc5e8a240310c4d/650a128a34386a1206b6506c_FINAL%20Briefing%20-%20Adoption%20of%20Automation%20and%20AI%20in%20the%20UK.pdf What drives UK firms to adopt AI and robotics, and what are the consequences for jobs?]
** [https://www.digitalinformationworld.com/2023/09/78-of-companies-say-ai-created-more-jobs.html 78% of Companies Say AI Created More Jobs]
* 2023-11: [https://theaipi.org/ai-interactive-map/ New Analysis Shows Over 20% of US Jobs Significantly Exposed to AI Automation In the Near Future]
* 2024-01: [https://techcrunch.com/2024/01/09/duolingo-cut-10-of-its-contractor-workforce-as-the-company-embraces-ai/ Duolingo cuts 10% of its contractor workforce as the company embraces AI]
* 2024-02: [https://www.pwc.com/gx/en/issues/c-suite-insights/the-leadership-agenda/gen-ai-is-a-tool-for-growth-not-just-efficiency.html#:~:text=One%20out%20of%20every%20four%20of%20the%204%2C702,to%20accomplish%20the%20same%20tasks%20with%20fewer%20workers Gen AI is a tool for growth, not just efficiency: Tech CEOs are investing to build their workforce and capitalise on new opportunities from generative AI. That’s a sharp contrast to how their peers view it.]
* 2024-04: [https://www.nytimes.com/2024/04/10/business/investment-banking-jobs-artificial-intelligence.html AI is Poised to Replace the Entry-Level Grunt Work of a Wall Street Career]
* 2024-07: [https://www.wired.com/story/ai-is-already-taking-jobs-in-the-video-game-industry/ AI Is Already Taking Jobs in the Video Game Industry]: A WIRED investigation finds that major players like Activision Blizzard, which recently laid off scores of workers, are using generative AI for game development
* 2024-08: [https://www.bbc.com/news/articles/c80e1gp9m9zo Klarna: AI lets us cut thousands of jobs - but pay more]
* 2025-01: [https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/4f39375d-59c2-4c4a-b394-f3eed7858c80/content AI and Freelancers: Has the Inflection Point Arrived?]
* 2025-01: [https://www.aporiamagazine.com/p/yes-youre-going-to-be-replaced Yes, you're going to be replaced: So much cope about AI]
* 2025-03: [https://commonplace.org/2025/03/20/will-ai-automate-away-your-job/ Will AI Automate Away Your Job? The time-horizon model explains the future of the technology]
* 2025-05: [https://www.forbes.com/sites/jackkelly/2025/05/04/its-time-to-get-concerned-klarna-ups-duolingo-cisco-and-many-other-companies-are-replacing-workers-with-ai/ It’s Time To Get Concerned, Klarna, UPS, Duolingo, Cisco, And Many Other Companies Are Replacing Workers With AI]
* 2025-05: [https://time.com/7289692/when-ai-replaces-workers/ What Happens When AI Replaces Workers?]
* 2025-05: [https://www.oxfordeconomics.com/resource/educated-but-unemployed-a-rising-reality-for-us-college-grads/ Educated but unemployed, a rising reality for US college grads] Structural shifts in tech hiring and the growing impact of AI are driving higher unemployment among recent college graduates
* 2025-05: NY Times: [https://www.nytimes.com/2025/05/30/technology/ai-jobs-college-graduates.html?unlocked_article_code=1.LE8.LlC6.eT5XcpA9hxC2&smid=url-share For Some Recent Graduates, the A.I. Job Apocalypse May Already Be Here] The unemployment rate for recent college graduates has jumped as companies try to replace entry-level workers with artificial intelligence
* 2025-06: [https://80000hours.org/agi/guide/skills-ai-makes-valuable/ How not to lose your job to AI] The skills AI will make more valuable (and how to learn them)
* 2025-06: [https://arxiv.org/abs/2506.06576 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce]
[[Image:0dab4c86-882d-4095-9d12-d19684ed5184 675x680.png|300px]]
* 2025-07: Harvard Business Review: [https://hbr.org/2025/06/what-gets-measured-ai-will-automate What Gets Measured, AI Will Automate]
* 2025-08: [https://digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/ Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence]
* 2025-10: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5560401 Performance or Principle: Resistance to Artificial Intelligence in the U.S. Labor Market]
* 2025-10: [https://www.siliconcontinent.com/p/the-ai-becker-problem The AI Becker problem: Who will train the next generation?]
* 2026-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6134506 AI, Automation, and Expertise]
* 2026-02: [https://arachnemag.substack.com/p/the-jevons-paradox-for-intelligence The Jevons Paradox for Intelligence: Fears of AI-induced job loss could not be more wrong]

==Productivity Impact==
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2026-02: [https://www.ft.com/content/4b51d0b4-bbfe-4f05-b50a-1d485d419dc5 The AI productivity take-off is finally visible] ([https://x.com/erikbryn/status/2023075588974735869?s=20 Erik Brynjolfsson])
** Businesses are finally beginning to reap some of AI's benefits.
* 2026-02: New York Times: [https://www.nytimes.com/2026/02/18/opinion/ai-software.html The A.I. Disruption We’ve Been Waiting for Has Arrived]

==National Security==
* 2025-04: Jeremie Harris and Edouard Harris: [https://superintelligence.gladstone.ai/ America’s Superintelligence Project]

==AI Manhattan Project==
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-10: [https://thezvi.substack.com/p/ai-88-thanks-for-the-memos?open=false#%C2%A7thanks-for-the-memos-introduction-and-competitiveness White House Memo calls for action on AI]
* 2024-11: [https://www.uscc.gov/annual-report/2024-annual-report-congress 2024 Annual Report to Congress]: [https://www.reuters.com/technology/artificial-intelligence/us-government-commission-pushes-manhattan-project-style-ai-initiative-2024-11-19/ calls] for "Manhattan Project-style" effort
* 2025-05-29: [https://x.com/ENERGY/status/1928085878561272223 DoE Tweet]: "AI is the next Manhattan Project, and THE UNITED STATES WILL WIN. 🇺🇸"
* 2025-07: [https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get How big could an “AI Manhattan Project” get?]

=Near-term=
* 2021-08: Daniel Kokotajlo: [https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like What 2026 looks like]
* 2025-02: Sam Altman: [https://blog.samaltman.com/three-observations Three Observations]
*# The intelligence of an AI model roughly equals the log of the resources used to train and run it.
*# The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use
*# The socioeconomic value of linearly increasing intelligence is super-exponential in nature
* 2025-03: [https://www.pathwaysai.org/p/glimpses-of-ai-progess Glimpses of AI Progress: Mental models for fast times]
* 2025-03: [https://www.nature.com/articles/s41598-025-92190-7 Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
** 2025-07: Video: [https://www.youtube.com/watch?v=5KVDDfAkRgc Are We 3 Years From AI Disaster? A Rigorous Forecast]
* 2025-04: Stanford HAI: [https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf Artificial Intelligence Index Report 2025]
* 2025-04: Arvind Narayananand Sayash Kapoor: [https://kfai-documents.s3.amazonaws.com/documents/c3cac5a2a7/AI-as-Normal-Technology---Narayanan---Kapoor.pdf AI as Normal Technology]
* 2025-04: Dwarkesh Patel: [https://www.dwarkesh.com/p/questions-about-ai Questions about the Future of AI]
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: IdeaFoundry: [https://ideafoundry.substack.com/p/evolution-vs-extinction-the-choice Evolution vs. Extinction: The Choice is Ours] The next 18 months will decide whether AI ends us or evolves us
* 2025-07: [https://cfg.eu/advanced-ai-possible-futures/ Advanced AI: Possible futures] Five scenarios for how the AI-transition could unfold
* 2025-11: [https://android-dreams.ai/ Android Dreams]
* 2026-02: [https://www.citriniresearch.com/ Citrini]: [https://www.citriniresearch.com/p/2028gic The 2028 Global Intelligence Crisis: A Thought Exercise in Financial History, from the Future]

==Insightful Analysis of Current State==
* 2025-11: Andy Masley: [https://andymasley.substack.com/p/the-lump-of-cognition-fallacy The lump of cognition fallacy: The extended mind as the advance of civilization]
* 2026-02: Eric Jang: [https://evjang.com/2026/02/04/rocks.html As Rocks May Think]
* 2026-02: Matt Shumer: [https://x.com/mattshumer_/status/2021256989876109403 Something Big Is Happening]
* 2026-02: Minh Pham: [https://x.com/buckeyevn/status/2014171253045960803?s=20 Why Most Agent Harnesses Are Not Bitter Lesson Pilled]

=Overall=
* 1993: [https://en.wikipedia.org/wiki/Vernor_Vinge Vernor Vinge]: [https://edoras.sdsu.edu/~vinge/misc/singularity.html The Coming Technological Singularity: How to Survive in the Post-Human Era]
* 2025-03: Kevin Roose (New York Times): [https://www.nytimes.com/2025/03/14/technology/why-im-feeling-the-agi.html?unlocked_article_code=1.304.TIEy.SmNhKYO4e9c7&smid=url-share Powerful A.I. Is Coming. We’re Not Ready.] Three arguments for taking progress toward artificial general intelligence, or A.G.I., more seriously — whether you’re an optimist or a pessimist.
* 2025-03: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/thoughts-on-future-ai.html My Thoughts on the Future of "AI"]: "I have very wide error bars on the potential future of large language models, and I think you should too."
* 2025-06: Sam Altman: [https://blog.samaltman.com/the-gentle-singularity The Gentle Singularity]

==Surveys of Opinions/Predictions==
* 2016-06: [https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/ 2016 Expert Survey on Progress in AI]
** 2023-03: [https://aiimpacts.org/scoring-forecasts-from-the-2016-expert-survey-on-progress-in-ai/ Scoring forecasts from the 2016 “Expert Survey on Progress in AI”]
* 2022-10: Forecasting Research Institute: [https://forecastingresearch.org/near-term-xpt-accuracy Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament]
** 2025-09: Ethan Mollick: [https://x.com/emollick/status/1962859757674344823 Progress is ahead of expectations]
* 2023-08: [https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2023_expert_survey_on_progress_in_ai 2023 Expert Survey on Progress in AI]
* 2024-01: [https://arxiv.org/abs/2401.02843 Thousands of AI Authors on the Future of AI]
* 2025-02: [https://arxiv.org/abs/2502.14870 Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts]
* 2025-02: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/forecasting-ai-2025-update.html AI forecasting retrospective: you're (probably) over-confident]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/long-timelines-to-advanced-ai-have "Long" timelines to advanced AI have gotten crazy short]
* 2025-05: [https://theaidigest.org/ai2025-analysis-may AI 2025 Forecasts - May Update]
* 2026-02: [https://www.nature.com/articles/s41598-026-39070-w Lay beliefs about the badness, likelihood, and importance of human extinction]

==Bad Outcomes==
* [https://pauseai.info/pdoom List of p(doom) values]
* 2019-03: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like What failure looks like]
* 2023-03: gwern: [https://gwern.net/fiction/clippy It Looks Like You’re Trying To Take Over The World]
* 2025-01: [https://arxiv.org/abs/2501.16946 Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development] ([https://gradual-disempowerment.ai/ web version])
** 2025-02: [https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment The Risk of Gradual Disempowerment from AI]
** 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-09: [https://doctrines.ai/ The three main doctrines on the future of AI]
** '''Dominance doctrine:''' First actor to create advanced AI will attain overwhelming strategic superiority
** '''Extinction doctrine:''' Humanity will lose control of ASI, leading to extinction or permanent disempowerment
** '''Replacement doctrine:''' AI will automate human tasks, but without fundamentally reshaping or ending civilization
* 2025-09: Sean ÓhÉigeartaigh: [https://www.cambridge.org/core/journals/cambridge-prisms-extinction/article/extinction-of-the-human-species-what-could-cause-it-and-how-likely-is-it-to-occur/D8816A79BEF5A4C30A3E44FD8D768622 Extinction of the human species: What could cause it and how likely is it to occur?]

==Intelligence Explosion==
* 2023-06: [https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/ What a Compute-Centric Framework Says About Takeoff Speeds]
** [https://takeoffspeeds.com/ takeoffspeeds.com simulator]
* 2025-02: [https://www.forethought.org/research/three-types-of-intelligence-explosion Three Types of Intelligence Explosion]
* 2025-03: Future of Life Institute: [https://futureoflife.org/ai/are-we-close-to-an-intelligence-explosion/ Are we close to an intelligence explosion?] AIs are inching ever-closer to a critical threshold. Beyond this threshold lie great risks—but crossing it is not inevitable.
* 2025-03: Forethought: [https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion Will AI R&D Automation Cause a Software Intelligence Explosion?]
[[Image:Gm-1jugbYAAtq Y.jpeg|450px]]
* 2025-05: [https://www.thelastinvention.ai/ The Last Invention] Why Humanity’s Final Creation Changes Everything
* 2025-08: [https://www.forethought.org/research/how-quick-and-big-would-a-software-intelligence-explosion-be How quick and big would a software intelligence explosion be?]

==Superintelligence==
* 2024-10: [http://yager-research.ca/2024/10/how-smart-will-asi-be/ How Smart will ASI be?]
* 2024-11: [http://yager-research.ca/2024/11/concise-argument-for-asi-risk/ Concise Argument for ASI Risk]
* 2025-03: [https://dynomight.net/smart/ Limits of smart]
* 2025-05: [https://timfduffy.substack.com/p/the-limits-of-superintelligence?manualredirect= The Limits of Superintelligence]

==Long-range/Philosophy==
* 2023-03: Dan Hendrycks: [https://arxiv.org/abs/2303.16200 Natural Selection Favors AIs over Humans]

=Psychology=
* 2025-01: [https://longerramblings.substack.com/p/a-defence-of-slowness-at-the-end A defence of slowness at the end of the world]

=Positives & Optimism=
==Science & Technology Improvements==
* 2023-05: [https://www.planned-obsolescence.org/author/kelsey/ Kelsey Piper]: [https://www.planned-obsolescence.org/the-costs-of-caution/ The costs of caution]
* 2024-09: Sam Altman: [https://ia.samaltman.com/ The Intelligence Age]
* 2024-10: Dario Amodei: [https://darioamodei.com/machines-of-loving-grace Machines of Loving Grace]
* 2024-11: Google DeepMind: [https://www.aipolicyperspectives.com/p/a-new-golden-age-of-discovery A new golden age of discovery]
* 2025-03: [https://finmoorhouse.com/ Fin Moorhouse], [https://www.williammacaskill.com/ Will MacAskill]: [https://www.forethought.org/research/preparing-for-the-intelligence-explosion Preparing for the Intelligence Explosion]

==Social==
* 2025-09: [https://blog.cosmos-institute.org/p/coasean-bargaining-at-scale Coasean Bargaining at Scale]: Decentralization, coordination, and co-existence with AGI
* 2025-10: [https://www.nber.org/system/files/chapters/c15309/c15309.pdf#page=15.23 The Coasean Singularity? Demand, Supply, and Market Design with AI Agents]

==Post-scarcity Society==
* 2004: Eliezer Yudkowsky (MIRI): [https://intelligence.org/files/CEV.pdf Coherent Extrapolated Volition] and [https://www.lesswrong.com/s/d3WgHDBAPYYScp5Em/p/K4aGvLnHvYgX9pZHS Fun Theory]
* 2019: John Danaher: [https://www.jstor.org/stable/j.ctvn5txpc Automation and Utopia: Human Flourishing in a World Without Work]

==The Grand Tradeoff==
* 2026-02: Nick Bostrom: [https://nickbostrom.com/optimal.pdf Optimal Timing for Superintelligence: Mundane Considerations for Existing People]

=Plans=
* [https://www.narrowpath.co/ A Narrow Path: How to Secure our Future]
* Marius Hobbhahn: [https://www.lesswrong.com/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan What’s the short timeline plan?]
* [https://cfg.eu/building-cern-for-ai/ Building CERN for AI: An institutional blueprint]
* [https://arxiv.org/abs/2503.05710 AGI, Governments, and Free Societies]
* [https://controlai.com/ Control AI]: [https://controlai.com/dip The Direct Institutional Plan]
* Luke Drago and L Rudolf L: [https://lukedrago.substack.com/p/the-use-of-knowledge-in-agi-society?triedRedirect=true The use of knowledge in (AGI) society]: How to build to break the [https://lukedrago.substack.com/p/the-intelligence-curse intelligence curse]
* [https://www.agisocialcontract.org/ AGI Social Contract]
** [https://www.agisocialcontract.org/forging-a-new-agi-social-contract Forging A New AGI Social Contract]
* Yoshua Bengio: [https://time.com/7283507/safer-ai-development/ A Potential Path to Safer AI Development]
** 2025-02: [https://arxiv.org/abs/2502.15657 Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?]
* 2026-01: Dario Amodei: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/vjAM7F8vMZS7oRrrh/how-do-we-more-safely-defer-to-ais How do we (more) safely defer to AIs?]

==Philosophy==
* [https://danfaggella.com/ Dan Faggella]:
** 2018-07: [https://danfaggella.com/moral-singularity/ Moral Singularity – Unpredictable Values Bodes Poorly for Humanity]
** 2025-02: [https://danfaggella.com/bend/ There is No Pause – We Must Bend the Posthuman Trajectory]
* Joe Carlsmith: 2024: [https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi Otherness and control in the age of AGI]
*# [https://joecarlsmith.com/2024/01/02/gentleness-and-the-artificial-other Gentleness and the artificial Other]
*# [https://joecarlsmith.com/2024/01/04/deep-atheism-and-ai-risk Deep atheism and AI risk]
*# [https://joecarlsmith.com/2024/01/08/when-yang-goes-wrong When “yang” goes wrong]
*# [https://joecarlsmith.com/2024/01/09/does-ai-risk-other-the-ais Does AI risk “other” the AIs?]
*# [https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism An even deeper atheism]
*# [https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy Being nicer than Clippy]
*# [https://joecarlsmith.com/2024/01/18/on-the-abolition-of-man On the abolition of man]
*# [https://joecarlsmith.com/2024/03/21/on-green On green]
*# [https://joecarlsmith.com/2024/03/25/on-attunement On attunement]
*# [https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust Loving a world you don’t trust]
* Anthony Aguirre:
** [https://x.com/AnthonyNAguirre/status/1898023049930457468 2025-03]: [https://keepthefuturehuman.ai/ Keep The Future Human]
[[Image:GlchEeObwAQ88NK.jpeg|300px]]
* 2025-04: Scott Alexander (Astral Codex Ten): [https://www.astralcodexten.com/p/the-colors-of-her-coat The Colors Of Her Coat] (response to [https://www.theintrinsicperspective.com/p/welcome-to-the-semantic-apocalypse semantic apocalypse] and semantic satiation)
* 2025-05: Helen Toner: [https://www.ai-frontiers.org/articles/were-arguing-about-ai-safety-wrong We’re Arguing About AI Safety Wrong]: Dynamism vs. stasis is a clearer lens for criticizing controversial AI safety prescriptions
* 2025-05: Joe Carlsmith: [https://joecarlsmith.substack.com/p/the-stakes-of-ai-moral-status The stakes of AI moral status]

==Research==
* 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]

==Alignment==
* 2023-03: Leopold Aschenbrenner: [https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/ Nobody’s on the ball on AGI alignment]
* 2024-03: [https://arxiv.org/abs/2404.10636 What are human values, and how do we align AI to them?] ([https://meaningalignment.substack.com/p/0480e023-98c0-4633-a604-990d3ac880ac blog])
* 2025: Joe Carlsmith: [https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem How do we solve the alignment problem?] Introduction to an essay series on paths to safe, useful superintelligence
*# [https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment What is it to solve the alignment problem?] Also: to avoid it? Handle it? Solve it forever? Solve it completely? ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16617671-what-is-it-to-solve-the-alignment-problem audio version])
*# [https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power When should we worry about AI power-seeking?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16651469-when-should-we-worry-about-ai-power-seeking audio version])
*# [https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety Paths and waystations in AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16768804-paths-and-waystations-in-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/ai-for-ai-safety AI for AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16790183-ai-for-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/can-we-safely-automate-alignment Can we safely automate alignment research?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17069901-can-we-safely-automate-alignment-research audio version], [https://joecarlsmith.substack.com/p/video-and-transcript-of-talk-on-automating?utm_source=post-email-title&publication_id=1022275&post_id=162375391&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email video version])
*# [https://joecarlsmith.substack.com/p/giving-ais-safe-motivations?utm_source=post-email-title&publication_id=1022275&post_id=171250683&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email Giving AIs safe motivations] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17686921-giving-ais-safe-motivations audio version])
*# [https://joecarlsmith.com/2025/09/29/controlling-the-options-ais-can-pursue Controlling the options AIs can pursue] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17909401-controlling-the-options-ais-can-pursue audio version])
*# [https://joecarlsmith.substack.com/p/how-human-like-do-safe-ai-motivations?utm_source=post-email-title&publication_id=1022275&post_id=178666988&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email How human-like do safe AI motivations need to be?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18175429-how-human-like-do-safe-ai-motivations-need-to-be audio version])
*# [https://joecarlsmith.substack.com/p/building-ais-that-do-human-like-philosophy Building AIs that do human-like philosophy: AIs will face philosophical questions humans can't answer for them] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18591342-building-ais-that-do-human-like-philosophy audio version])
*# [https://joecarlsmith.substack.com/p/on-restraining-ai-development-for?utm_source=post-email-title&publication_id=1022275&post_id=191385185&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email On restraining AI development for the sake of safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18869440-on-restraining-ai-development-for-the-sake-of-safety audio version])
* 2025-04: Dario Amodei: [https://www.darioamodei.com/post/the-urgency-of-interpretability The Urgency of Interpretability]

==Strategic/Technical==
* 2025-03: [https://resilience.baulab.info/docs/AI_Action_Plan_RFI.pdf AI Dominance Requires Interpretability and Standards for Transparency and Security]
* 2026-02: [https://www.gap-map.org/capabilities/?sort=bottlenecks Fundamental Development Gap Map v1.0]

==Strategic/Policy==
* 2015-03: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-2 Machine intelligence, part 2]
* 2019-07: Amanda Askell, Miles Brundage, Gillian Hadfield: [https://arxiv.org/abs/1907.04534 The Role of Cooperation in Responsible AI Development]
* 2025-03: Dan Hendrycks, Eric Schmidt, Alexandr Wang: [https://www.nationalsecurity.ai/ Superintelligence Strategy]
** [https://www.nationalsecurity.ai/chapter/executive-summary Executive Summary]
** [https://www.nationalsecurity.ai/chapter/introduction Introduction]
** [https://www.nationalsecurity.ai/chapter/ai-is-pivotal-for-national-security AI Is Pivotal for National Security]
** [https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim Deterrence with Mutual Assured AI Malfunction (MAIM)]
** [https://www.nationalsecurity.ai/chapter/nonproliferation Nonproliferation]
** [https://www.nationalsecurity.ai/chapter/competitiveness Competitiveness]
** [https://www.nationalsecurity.ai/chapter/conclusion Conclusion]
** [https://www.nationalsecurity.ai/chapter/appendix Appendix FAQs]
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human] ([https://keepthefuturehuman.ai/essay/ essay])
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late (video)] (2025)
**# Oversight: Registration required for training >1025 FLOP and inference >1019 FLOP/s (~1,000 B200 GPUs @ $25M). Build cryptographic licensing into hardware.
**# Computation Limits: Ban on training models >1027 FLOP or inference >1020 FLOP/s.
**# Strict Liability: Hold AI companies responsible for outcomes.
**# Tiered Regulation: Low regulation on tool-AI, strictest regulation on AGI (general, capable, autonomous systems).
* 2025-04: [https://x.com/deanwball Dean W. Ball]: [https://arxiv.org/abs/2504.11501 A Framework for the Private Governance of Frontier Artificial Intelligence]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/nonproliferation-is-the-wrong-approach?source=queue Nonproliferation is the wrong approach to AI misuse]
* 2025-04: MIRI: [https://techgov.intelligence.org/research/ai-governance-to-avoid-extinction AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions]
* 2025-05: [https://writing.antonleicht.me/p/the-new-ai-policy-frontier The New AI Policy Frontier]: Beyond the shortcomings of centralised control and alignment, a new school of thought on AI governance emerges. It still faces tricky politics.
* 2025-05: [https://uncpga.world/agi-uncpga-report/ AGI UNCPGA Report]: Governance of the Transition to Artificial General Intelligence (AGI) Urgent Considerations for the UN General Assembly: Report for the Council of Presidents of the United Nations General Assembly (UNCPGA)
* 2025-06: [https://writing.antonleicht.me/p/ai-and-jobs-politics-without-policy AI & Jobs: Politics without Policy] Political support mounts - for a policy platform that does not yet exist
* 2025-06: [https://x.com/littIeramblings Sarah Hastings-Woodhouse]: [https://drive.google.com/file/d/1mmdHBE6M2yiyL21-ctTuRLNH5xOFjqWm/view Safety Features for a Centralized AGI Project]
* 2025-07: [https://writing.antonleicht.me/p/a-moving-target A Moving Target] Why we might not be quite ready to comprehensively regulate AI, and why it matters
* 2025-07: [https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf Anthropic: Build AI in America] ([https://www.anthropic.com/news/build-ai-in-america blog])
* 2025-12: [https://asi-prevention.com/ How middle powers may prevent the development of artificial superintelligence]
* 2026-03: [https://humanstatement.org/ The Pro-Human AI Declaration]

==Restriction==
* 2024-05: OpenAI: [https://openai.com/index/reimagining-secure-infrastructure-for-advanced-ai/ Reimagining secure infrastructure for advanced AI] OpenAI calls for an evolution in infrastructure security to protect advanced AI
* 2025-07: MIRI: [https://arxiv.org/abs/2507.09801 Technical Requirements for Halting Dangerous AI Activities]

=See Also=
* [[AI safety]]

AI predictions

2026-03-23T17:29:16Z

KevinYager: /* Alignment */

=Capability Scaling=
* 2019-03: Rich Sutton: [https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf The Bitter Lesson]
* 2020-09: Ajeya Cotra: [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines Draft report on AI timelines]
* 2022-01: gwern: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis]
* 2023-05: Richard Ngo: [https://www.lesswrong.com/posts/BoA3agdkAzL6HQtQP/clarifying-and-predicting-agi Clarifying and predicting AGI]
* 2024-06: Aidan McLaughlin: [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
* 2025-03: [https://arxiv.org/abs/2503.14499 Measuring AI Ability to Complete Long Tasks Measuring AI Ability to Complete Long Tasks]
** 2025-04: [https://peterwildeford.substack.com/p/forecaster-reacts-metrs-bombshell Forecaster reacts: METR's bombshell paper about AI acceleration] New data supports an exponential AI curve, but lots of uncertainty remains
** 2025-04: AI Digest: [https://theaidigest.org/time-horizons A new Moore's Law for AI agents]
[[Image:GmZHL8xWQAAtFlF.jpeg|450px]]
* 2025-04: [https://epoch.ai/blog/trends-in-ai-supercomputers Trends in AI Supercomputers] ([https://arxiv.org/abs/2504.16026 preprint])
* [https://ai-timeline.org/ The Road to AGI] (timeline visualization)
* 2025-09: [https://arxiv.org/abs/2509.09677 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs]
* 2025-09: [https://www.julian.ac/blog/2025/09/27/failing-to-understand-the-exponential-again/ Failing to Understand the Exponential, Again]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/rRbDNQLfihiHbXytf/distinguish-between-inference-scaling-and-larger-tasks-use Distinguish between inference scaling and "larger tasks use more compute"]
* 2026-03: [https://arxiv.org/abs/2603.03992 Measuring AI R&D Automation] ([https://astrangeattractor.substack.com/p/measuring-ai-r-and-d-automation?triedRedirect=true blog])

==Scaling Laws==
See: [[AI_understanding#Scaling_Laws|Scaling Laws]]

==AGI Achievable==
* Yoshua Bengio: [https://arxiv.org/abs/2310.17688 Managing extreme AI risks amid rapid progress]
* Leopold Aschenbrenner: [https://situational-awareness.ai/from-gpt-4-to-agi/#Counting_the_OOMs Situational Awareness: Counting the OOMs]
* Richard Ngo: [https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5 Visualizing the deep learning revolution]
* Katja Grace: [https://blog.aiimpacts.org/p/2023-ai-survey-of-2778-six-things Survey of 2,778 AI authors: six parts in pictures]
* Epoch AI: [https://epoch.ai/trends Machine Learning Trends]
* AI Digest: [https://theaidigest.org/progress-and-dangers How fast is AI improving?]
* 2025-06: [https://80000hours.org/agi/guide/when-will-agi-arrive/ The case for AGI by 2030]

==AGI Definition==
* 2023-11: Allan Dafoe, Shane Legg, et al.: [https://arxiv.org/abs/2311.02462 Levels of AGI for Operationalizing Progress on the Path to AGI]
* 2024-04: Bowen Xu: [https://arxiv.org/abs/2404.10731 What is Meant by AGI? On the Definition of Artificial General Intelligence]
* 2025-10: Dan Hendrycks et al.: [https://www.agidefinition.ai/paper.pdf A Definition of AGI]
* 2026-01: [https://arxiv.org/abs/2601.07364 On the universal definition of intelligence]

==Recursive Self Improvement (RSI)==
* 2026-02: [https://80000hours.org/articles/how-ai-driven-feedback-loops-could-make-things-very-crazy-very-fast/ How AI-driven feedback loops could make things very crazy, very fast]

==Progress Models==
From [http://yager-research.ca/2025/04/ai-impact-predictions/ AI Impact Predictions]:

[[Image:AI impact models-2025 11 24.png|450px]]

=Economic and Political=
* 2019-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3482150 The Impact of Artificial Intelligence on the Labor Market]
* 2020-06: [https://www.openphilanthropy.org/research/modeling-the-human-trajectory/ Modeling the Human Trajectory] (GDP)
* 2021-06: [https://www.openphilanthropy.org/research/report-on-whether-ai-could-drive-explosive-economic-growth/ Report on Whether AI Could Drive Explosive Economic Growth]
* 2023-10: Marc Andreessen: [https://a16z.com/the-techno-optimist-manifesto/ The Techno-Optimist Manifesto]
* 2023-12: [https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html My techno-optimism]: "defensive acceleration" ([https://vitalik.eth.limo/index.html Vitalik Buterin])
* 2024-03: Noah Smith: [https://www.noahpinion.blog/p/plentiful-high-paying-jobs-in-the Plentiful, high-paying jobs in the age of AI: Comparative advantage is very subtle, but incredibly powerful.] ([https://x.com/liron/status/1768013030741475485 video])
* 2024-03: [https://doi.org/10.3386/w32255 Scenarios for the Transition to AGI] (AGI leads to wage collapse)
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-06: [https://www.frbsf.org/wp-content/uploads/AI-and-Growth-Aghion-Bunel.pdf AI and Growth: Where Do We Stand?]
* 2024-09: OpenAI [https://cdn.openai.com/global-affairs/openai-infra-economics-10.09.24.pdf Infrastructure is Destiny: Economic Returns on US Investment in Democratic AI]
* 2024-12: [https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi By default, capital will matter more than ever after AGI] (L Rudolf L)
* 2025-01: [https://lukedrago.substack.com/p/the-intelligence-curse The Intelligence Curse]: With AGI, powerful actors will lose their incentives to invest in people
** Updated 2025-04: [https://intelligence-curse.ai/ The Intelligence Curse] (Luke Drago and Rudolf Laine)
*** [https://intelligence-curse.ai/pyramid/ Pyramid Replacement]
*** [https://intelligence-curse.ai/capital/ Capital, AGI, and Human Ambition]
*** [https://intelligence-curse.ai/defining/ Defining the Intelligence Curse]
*** [https://intelligence-curse.ai/shaping/ Shaping the Social Contract]
*** [https://intelligence-curse.ai/breaking/ Breaking the Intelligence Curse]
*** [https://intelligence-curse.ai/history/ History is Yours to Write]
* 2025-01: Microsoft: [https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/ The Golden Opportunity for American AI]
* 2025-01: [https://www.maximum-progress.com/p/agi-will-not-make-labor-worthless AGI Will Not Make Labor Worthless]
* 2025-01: [https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf AI in America: OpenAI's Economic Blueprint] ([https://openai.com/global-affairs/openais-economic-blueprint/ blog])
* 2025-01: [https://inferencemagazine.substack.com/p/how-much-economic-growth-from-ai How much economic growth from AI should we expect, how soon?]
* 2025-02: Morgan Stanley: [https://advisor.morganstanley.com/john.howard/documents/field/j/jo/john-howard/The_Humanoid_100_-_Mapping_the_Humanoid_Robot_Value_Chain.pdf The Humanoid 100: Mapping the Humanoid Robot Value Chain]
* 2025-02: [https://www.anthropic.com/news/the-anthropic-economic-index The Anthropic Economic Index]: [https://assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations]
* 2025-02: [https://arxiv.org/abs/2502.11264 Strategic Wealth Accumulation Under Transformative AI Expectations]
* 2025-02: Tyler Cowen: [https://marginalrevolution.com/marginalrevolution/2025/02/why-i-think-ai-take-off-is-relatively-slow.html Why I think AI take-off is relatively slow]
* 2025-03: Epoch AI: [https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d Most AI value will come from broad automation, not from R&D]
** The primary economic impact of AI will be its ability to broadly automate labor
** Automating AI R&D alone likely won’t dramatically accelerate AI progress
** Fully automating R&D requires a very broad set of abilities
** AI takeoff will likely be diffuse and salient
* 2025-03: [https://www.anthropic.com/news/anthropic-economic-index-insights-from-claude-sonnet-3-7 Anthropic Economic Index: Insights from Claude 3.7 Sonnet]
* 2025-04: [https://inferencemagazine.substack.com/p/will-there-be-extreme-inequality Will there be extreme inequality from AI?]
* 2025-04: [https://www.anthropic.com/research/impact-software-development Anthropic Economic Index: AI’s Impact on Software Development]
* 2025-05: [https://www.theguardian.com/books/2025/may/04/the-big-idea-can-we-stop-ai-making-humans-obsolete Better at everything: how AI could make human beings irrelevant]
* 2025-05: Forethought: [https://www.forethought.org/research/the-industrial-explosion The Industrial Explosion]
* 2025-05: [https://arxiv.org/abs/2505.20273 Ten Principles of AI Agent Economics]
* 2025-07: [https://substack.com/home/post/p-167879696 What Economists Get Wrong about AI] They ignore innovation effects, use outdated capability assumptions, and miss the robotics revolution
* 2025-07: [https://www.nber.org/books-and-chapters/economics-transformative-ai/we-wont-be-missed-work-and-growth-era-agi We Won't Be Missed: Work and Growth in the Era of AGI]
* 2025-07: [https://www.nber.org/papers/w34034 The Economics of Bicycles for the Mind]
* 2025-09: [https://conference.nber.org/conf_papers/f227491.pdf Genius on Demand: The Value of Transformative Artificial Intelligence]
* 2025-10: [https://peterwildeford.substack.com/p/ai-is-probably-not-a-bubble AI is probably not a bubble: AI companies have revenue, demand, and paths to immense value]
* 2025-11: [https://windowsontheory.org/2025/11/04/thoughts-by-a-non-economist-on-ai-and-economics/ Thoughts by a non-economist on AI and economics]
* 2025-11: [https://www.nber.org/papers/w34444 Artificial Intelligence, Competition, and Welfare]
* 2025-11: [https://www.anthropic.com/research/estimating-productivity-gains Estimating AI productivity gains from Claude conversations] (Anthropic)
* 2025-12: [https://benjamintodd.substack.com/p/how-ai-driven-feedback-loops-could How AI-driven feedback loops could make things very crazy, very fast]
* 2025-12: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth] (Philip Trammell and Leopold Aschenbrenner)
* 2026-01: [https://www.anthropic.com/research/anthropic-economic-index-january-2026-report Anthropic Economic Index: new building blocks for understanding AI use]
* 2026-01: [https://www.anthropic.com/research/economic-index-primitives Anthropic Economic Index report: economic primitives]
* 2026-02: Nate Silver: [https://www.natesilver.net/p/the-singularity-wont-be-gentle The singularity won't be gentle: If AI is even half as transformational as Silicon Valley assumes, politics will never be the same again]

==Job Loss==
* 2023-03: [https://arxiv.org/pdf/2303.10130 GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models]
** 2023-03: [https://www.livemint.com/news/world/these-jobs-are-most-at-risk-due-to-chatgpt-as-per-openai-study-11679358453267.html These jobs are most at risk due to ChatGPT, as per OpenAI study]
* 2023-08: [https://dx.doi.org/10.2139/ssrn.4527336 The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market]
** [https://x.com/jburnmurdoch/status/1722938749519077688 Freelancer sector shrinking]
[[Image:F-kVQuvWkAAemkr.png|400px]]
* 2023-09: [https://global-uploads.webflow.com/64d5f73a7fc5e8a240310c4d/650a128a34386a1206b6506c_FINAL%20Briefing%20-%20Adoption%20of%20Automation%20and%20AI%20in%20the%20UK.pdf What drives UK firms to adopt AI and robotics, and what are the consequences for jobs?]
** [https://www.digitalinformationworld.com/2023/09/78-of-companies-say-ai-created-more-jobs.html 78% of Companies Say AI Created More Jobs]
* 2023-11: [https://theaipi.org/ai-interactive-map/ New Analysis Shows Over 20% of US Jobs Significantly Exposed to AI Automation In the Near Future]
* 2024-01: [https://techcrunch.com/2024/01/09/duolingo-cut-10-of-its-contractor-workforce-as-the-company-embraces-ai/ Duolingo cuts 10% of its contractor workforce as the company embraces AI]
* 2024-02: [https://www.pwc.com/gx/en/issues/c-suite-insights/the-leadership-agenda/gen-ai-is-a-tool-for-growth-not-just-efficiency.html#:~:text=One%20out%20of%20every%20four%20of%20the%204%2C702,to%20accomplish%20the%20same%20tasks%20with%20fewer%20workers Gen AI is a tool for growth, not just efficiency: Tech CEOs are investing to build their workforce and capitalise on new opportunities from generative AI. That’s a sharp contrast to how their peers view it.]
* 2024-04: [https://www.nytimes.com/2024/04/10/business/investment-banking-jobs-artificial-intelligence.html AI is Poised to Replace the Entry-Level Grunt Work of a Wall Street Career]
* 2024-07: [https://www.wired.com/story/ai-is-already-taking-jobs-in-the-video-game-industry/ AI Is Already Taking Jobs in the Video Game Industry]: A WIRED investigation finds that major players like Activision Blizzard, which recently laid off scores of workers, are using generative AI for game development
* 2024-08: [https://www.bbc.com/news/articles/c80e1gp9m9zo Klarna: AI lets us cut thousands of jobs - but pay more]
* 2025-01: [https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/4f39375d-59c2-4c4a-b394-f3eed7858c80/content AI and Freelancers: Has the Inflection Point Arrived?]
* 2025-01: [https://www.aporiamagazine.com/p/yes-youre-going-to-be-replaced Yes, you're going to be replaced: So much cope about AI]
* 2025-03: [https://commonplace.org/2025/03/20/will-ai-automate-away-your-job/ Will AI Automate Away Your Job? The time-horizon model explains the future of the technology]
* 2025-05: [https://www.forbes.com/sites/jackkelly/2025/05/04/its-time-to-get-concerned-klarna-ups-duolingo-cisco-and-many-other-companies-are-replacing-workers-with-ai/ It’s Time To Get Concerned, Klarna, UPS, Duolingo, Cisco, And Many Other Companies Are Replacing Workers With AI]
* 2025-05: [https://time.com/7289692/when-ai-replaces-workers/ What Happens When AI Replaces Workers?]
* 2025-05: [https://www.oxfordeconomics.com/resource/educated-but-unemployed-a-rising-reality-for-us-college-grads/ Educated but unemployed, a rising reality for US college grads] Structural shifts in tech hiring and the growing impact of AI are driving higher unemployment among recent college graduates
* 2025-05: NY Times: [https://www.nytimes.com/2025/05/30/technology/ai-jobs-college-graduates.html?unlocked_article_code=1.LE8.LlC6.eT5XcpA9hxC2&smid=url-share For Some Recent Graduates, the A.I. Job Apocalypse May Already Be Here] The unemployment rate for recent college graduates has jumped as companies try to replace entry-level workers with artificial intelligence
* 2025-06: [https://80000hours.org/agi/guide/skills-ai-makes-valuable/ How not to lose your job to AI] The skills AI will make more valuable (and how to learn them)
* 2025-06: [https://arxiv.org/abs/2506.06576 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce]
[[Image:0dab4c86-882d-4095-9d12-d19684ed5184 675x680.png|300px]]
* 2025-07: Harvard Business Review: [https://hbr.org/2025/06/what-gets-measured-ai-will-automate What Gets Measured, AI Will Automate]
* 2025-08: [https://digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/ Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence]
* 2025-10: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5560401 Performance or Principle: Resistance to Artificial Intelligence in the U.S. Labor Market]
* 2025-10: [https://www.siliconcontinent.com/p/the-ai-becker-problem The AI Becker problem: Who will train the next generation?]
* 2026-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6134506 AI, Automation, and Expertise]
* 2026-02: [https://arachnemag.substack.com/p/the-jevons-paradox-for-intelligence The Jevons Paradox for Intelligence: Fears of AI-induced job loss could not be more wrong]

==Productivity Impact==
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2026-02: [https://www.ft.com/content/4b51d0b4-bbfe-4f05-b50a-1d485d419dc5 The AI productivity take-off is finally visible] ([https://x.com/erikbryn/status/2023075588974735869?s=20 Erik Brynjolfsson])
** Businesses are finally beginning to reap some of AI's benefits.
* 2026-02: New York Times: [https://www.nytimes.com/2026/02/18/opinion/ai-software.html The A.I. Disruption We’ve Been Waiting for Has Arrived]

==National Security==
* 2025-04: Jeremie Harris and Edouard Harris: [https://superintelligence.gladstone.ai/ America’s Superintelligence Project]

==AI Manhattan Project==
* 2024-06: [https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf Situational Awareness] ([https://www.forourposterity.com/ Leopold Aschenbrenner]) - [https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/quotes-from-leopold-aschenbrenner-s-situational-awareness select quotes], [https://www.youtube.com/watch?v=zdbVtZIn9IM podcast], [https://danielmiessler.com/p/podcast-summary-dwarkesh-vs-leopold-aschenbrenner text summary of podcast]
* 2024-10: [https://thezvi.substack.com/p/ai-88-thanks-for-the-memos?open=false#%C2%A7thanks-for-the-memos-introduction-and-competitiveness White House Memo calls for action on AI]
* 2024-11: [https://www.uscc.gov/annual-report/2024-annual-report-congress 2024 Annual Report to Congress]: [https://www.reuters.com/technology/artificial-intelligence/us-government-commission-pushes-manhattan-project-style-ai-initiative-2024-11-19/ calls] for "Manhattan Project-style" effort
* 2025-05-29: [https://x.com/ENERGY/status/1928085878561272223 DoE Tweet]: "AI is the next Manhattan Project, and THE UNITED STATES WILL WIN. 🇺🇸"
* 2025-07: [https://epoch.ai/gradient-updates/how-big-could-an-ai-manhattan-project-get How big could an “AI Manhattan Project” get?]

=Near-term=
* 2021-08: Daniel Kokotajlo: [https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like What 2026 looks like]
* 2025-02: Sam Altman: [https://blog.samaltman.com/three-observations Three Observations]
*# The intelligence of an AI model roughly equals the log of the resources used to train and run it.
*# The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use
*# The socioeconomic value of linearly increasing intelligence is super-exponential in nature
* 2025-03: [https://www.pathwaysai.org/p/glimpses-of-ai-progess Glimpses of AI Progress: Mental models for fast times]
* 2025-03: [https://www.nature.com/articles/s41598-025-92190-7 Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
** 2025-07: Video: [https://www.youtube.com/watch?v=5KVDDfAkRgc Are We 3 Years From AI Disaster? A Rigorous Forecast]
* 2025-04: Stanford HAI: [https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf Artificial Intelligence Index Report 2025]
* 2025-04: Arvind Narayananand Sayash Kapoor: [https://kfai-documents.s3.amazonaws.com/documents/c3cac5a2a7/AI-as-Normal-Technology---Narayanan---Kapoor.pdf AI as Normal Technology]
* 2025-04: Dwarkesh Patel: [https://www.dwarkesh.com/p/questions-about-ai Questions about the Future of AI]
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: IdeaFoundry: [https://ideafoundry.substack.com/p/evolution-vs-extinction-the-choice Evolution vs. Extinction: The Choice is Ours] The next 18 months will decide whether AI ends us or evolves us
* 2025-07: [https://cfg.eu/advanced-ai-possible-futures/ Advanced AI: Possible futures] Five scenarios for how the AI-transition could unfold
* 2025-11: [https://android-dreams.ai/ Android Dreams]
* 2026-02: [https://www.citriniresearch.com/ Citrini]: [https://www.citriniresearch.com/p/2028gic The 2028 Global Intelligence Crisis: A Thought Exercise in Financial History, from the Future]

==Insightful Analysis of Current State==
* 2025-11: Andy Masley: [https://andymasley.substack.com/p/the-lump-of-cognition-fallacy The lump of cognition fallacy: The extended mind as the advance of civilization]
* 2026-02: Eric Jang: [https://evjang.com/2026/02/04/rocks.html As Rocks May Think]
* 2026-02: Matt Shumer: [https://x.com/mattshumer_/status/2021256989876109403 Something Big Is Happening]
* 2026-02: Minh Pham: [https://x.com/buckeyevn/status/2014171253045960803?s=20 Why Most Agent Harnesses Are Not Bitter Lesson Pilled]

=Overall=
* 1993: [https://en.wikipedia.org/wiki/Vernor_Vinge Vernor Vinge]: [https://edoras.sdsu.edu/~vinge/misc/singularity.html The Coming Technological Singularity: How to Survive in the Post-Human Era]
* 2025-03: Kevin Roose (New York Times): [https://www.nytimes.com/2025/03/14/technology/why-im-feeling-the-agi.html?unlocked_article_code=1.304.TIEy.SmNhKYO4e9c7&smid=url-share Powerful A.I. Is Coming. We’re Not Ready.] Three arguments for taking progress toward artificial general intelligence, or A.G.I., more seriously — whether you’re an optimist or a pessimist.
* 2025-03: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/thoughts-on-future-ai.html My Thoughts on the Future of "AI"]: "I have very wide error bars on the potential future of large language models, and I think you should too."
* 2025-06: Sam Altman: [https://blog.samaltman.com/the-gentle-singularity The Gentle Singularity]

==Surveys of Opinions/Predictions==
* 2016-06: [https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/ 2016 Expert Survey on Progress in AI]
** 2023-03: [https://aiimpacts.org/scoring-forecasts-from-the-2016-expert-survey-on-progress-in-ai/ Scoring forecasts from the 2016 “Expert Survey on Progress in AI”]
* 2022-10: Forecasting Research Institute: [https://forecastingresearch.org/near-term-xpt-accuracy Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament]
** 2025-09: Ethan Mollick: [https://x.com/emollick/status/1962859757674344823 Progress is ahead of expectations]
* 2023-08: [https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2023_expert_survey_on_progress_in_ai 2023 Expert Survey on Progress in AI]
* 2024-01: [https://arxiv.org/abs/2401.02843 Thousands of AI Authors on the Future of AI]
* 2025-02: [https://arxiv.org/abs/2502.14870 Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts]
* 2025-02: Nicholas Carlini: [https://nicholas.carlini.com/writing/2025/forecasting-ai-2025-update.html AI forecasting retrospective: you're (probably) over-confident]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/long-timelines-to-advanced-ai-have "Long" timelines to advanced AI have gotten crazy short]
* 2025-05: [https://theaidigest.org/ai2025-analysis-may AI 2025 Forecasts - May Update]
* 2026-02: [https://www.nature.com/articles/s41598-026-39070-w Lay beliefs about the badness, likelihood, and importance of human extinction]

==Bad Outcomes==
* [https://pauseai.info/pdoom List of p(doom) values]
* 2019-03: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like What failure looks like]
* 2023-03: gwern: [https://gwern.net/fiction/clippy It Looks Like You’re Trying To Take Over The World]
* 2025-01: [https://arxiv.org/abs/2501.16946 Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development] ([https://gradual-disempowerment.ai/ web version])
** 2025-02: [https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment The Risk of Gradual Disempowerment from AI]
** 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]
* 2025-04: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, Romeo Dean: [https://ai-2027.com/ AI 2027] ([https://ai-2027.com/scenario.pdf pdf])
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-09: [https://doctrines.ai/ The three main doctrines on the future of AI]
** '''Dominance doctrine:''' First actor to create advanced AI will attain overwhelming strategic superiority
** '''Extinction doctrine:''' Humanity will lose control of ASI, leading to extinction or permanent disempowerment
** '''Replacement doctrine:''' AI will automate human tasks, but without fundamentally reshaping or ending civilization
* 2025-09: Sean ÓhÉigeartaigh: [https://www.cambridge.org/core/journals/cambridge-prisms-extinction/article/extinction-of-the-human-species-what-could-cause-it-and-how-likely-is-it-to-occur/D8816A79BEF5A4C30A3E44FD8D768622 Extinction of the human species: What could cause it and how likely is it to occur?]

==Intelligence Explosion==
* 2023-06: [https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/ What a Compute-Centric Framework Says About Takeoff Speeds]
** [https://takeoffspeeds.com/ takeoffspeeds.com simulator]
* 2025-02: [https://www.forethought.org/research/three-types-of-intelligence-explosion Three Types of Intelligence Explosion]
* 2025-03: Future of Life Institute: [https://futureoflife.org/ai/are-we-close-to-an-intelligence-explosion/ Are we close to an intelligence explosion?] AIs are inching ever-closer to a critical threshold. Beyond this threshold lie great risks—but crossing it is not inevitable.
* 2025-03: Forethought: [https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion Will AI R&D Automation Cause a Software Intelligence Explosion?]
[[Image:Gm-1jugbYAAtq Y.jpeg|450px]]
* 2025-05: [https://www.thelastinvention.ai/ The Last Invention] Why Humanity’s Final Creation Changes Everything
* 2025-08: [https://www.forethought.org/research/how-quick-and-big-would-a-software-intelligence-explosion-be How quick and big would a software intelligence explosion be?]

==Superintelligence==
* 2024-10: [http://yager-research.ca/2024/10/how-smart-will-asi-be/ How Smart will ASI be?]
* 2024-11: [http://yager-research.ca/2024/11/concise-argument-for-asi-risk/ Concise Argument for ASI Risk]
* 2025-03: [https://dynomight.net/smart/ Limits of smart]
* 2025-05: [https://timfduffy.substack.com/p/the-limits-of-superintelligence?manualredirect= The Limits of Superintelligence]

==Long-range/Philosophy==
* 2023-03: Dan Hendrycks: [https://arxiv.org/abs/2303.16200 Natural Selection Favors AIs over Humans]

=Psychology=
* 2025-01: [https://longerramblings.substack.com/p/a-defence-of-slowness-at-the-end A defence of slowness at the end of the world]

=Positives & Optimism=
==Science & Technology Improvements==
* 2023-05: [https://www.planned-obsolescence.org/author/kelsey/ Kelsey Piper]: [https://www.planned-obsolescence.org/the-costs-of-caution/ The costs of caution]
* 2024-09: Sam Altman: [https://ia.samaltman.com/ The Intelligence Age]
* 2024-10: Dario Amodei: [https://darioamodei.com/machines-of-loving-grace Machines of Loving Grace]
* 2024-11: Google DeepMind: [https://www.aipolicyperspectives.com/p/a-new-golden-age-of-discovery A new golden age of discovery]
* 2025-03: [https://finmoorhouse.com/ Fin Moorhouse], [https://www.williammacaskill.com/ Will MacAskill]: [https://www.forethought.org/research/preparing-for-the-intelligence-explosion Preparing for the Intelligence Explosion]

==Social==
* 2025-09: [https://blog.cosmos-institute.org/p/coasean-bargaining-at-scale Coasean Bargaining at Scale]: Decentralization, coordination, and co-existence with AGI
* 2025-10: [https://www.nber.org/system/files/chapters/c15309/c15309.pdf#page=15.23 The Coasean Singularity? Demand, Supply, and Market Design with AI Agents]

==Post-scarcity Society==
* 2004: Eliezer Yudkowsky (MIRI): [https://intelligence.org/files/CEV.pdf Coherent Extrapolated Volition] and [https://www.lesswrong.com/s/d3WgHDBAPYYScp5Em/p/K4aGvLnHvYgX9pZHS Fun Theory]
* 2019: John Danaher: [https://www.jstor.org/stable/j.ctvn5txpc Automation and Utopia: Human Flourishing in a World Without Work]

==The Grand Tradeoff==
* 2026-02: Nick Bostrom: [https://nickbostrom.com/optimal.pdf Optimal Timing for Superintelligence: Mundane Considerations for Existing People]

=Plans=
* [https://www.narrowpath.co/ A Narrow Path: How to Secure our Future]
* Marius Hobbhahn: [https://www.lesswrong.com/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan What’s the short timeline plan?]
* [https://cfg.eu/building-cern-for-ai/ Building CERN for AI: An institutional blueprint]
* [https://arxiv.org/abs/2503.05710 AGI, Governments, and Free Societies]
* [https://controlai.com/ Control AI]: [https://controlai.com/dip The Direct Institutional Plan]
* Luke Drago and L Rudolf L: [https://lukedrago.substack.com/p/the-use-of-knowledge-in-agi-society?triedRedirect=true The use of knowledge in (AGI) society]: How to build to break the [https://lukedrago.substack.com/p/the-intelligence-curse intelligence curse]
* [https://www.agisocialcontract.org/ AGI Social Contract]
** [https://www.agisocialcontract.org/forging-a-new-agi-social-contract Forging A New AGI Social Contract]
* Yoshua Bengio: [https://time.com/7283507/safer-ai-development/ A Potential Path to Safer AI Development]
** 2025-02: [https://arxiv.org/abs/2502.15657 Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?]
* 2026-01: Dario Amodei: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI]
* 2026-02: Ryan Greenblatt: [https://www.lesswrong.com/posts/vjAM7F8vMZS7oRrrh/how-do-we-more-safely-defer-to-ais How do we (more) safely defer to AIs?]

==Philosophy==
* [https://danfaggella.com/ Dan Faggella]:
** 2018-07: [https://danfaggella.com/moral-singularity/ Moral Singularity – Unpredictable Values Bodes Poorly for Humanity]
** 2025-02: [https://danfaggella.com/bend/ There is No Pause – We Must Bend the Posthuman Trajectory]
* Joe Carlsmith: 2024: [https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi Otherness and control in the age of AGI]
*# [https://joecarlsmith.com/2024/01/02/gentleness-and-the-artificial-other Gentleness and the artificial Other]
*# [https://joecarlsmith.com/2024/01/04/deep-atheism-and-ai-risk Deep atheism and AI risk]
*# [https://joecarlsmith.com/2024/01/08/when-yang-goes-wrong When “yang” goes wrong]
*# [https://joecarlsmith.com/2024/01/09/does-ai-risk-other-the-ais Does AI risk “other” the AIs?]
*# [https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism An even deeper atheism]
*# [https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy Being nicer than Clippy]
*# [https://joecarlsmith.com/2024/01/18/on-the-abolition-of-man On the abolition of man]
*# [https://joecarlsmith.com/2024/03/21/on-green On green]
*# [https://joecarlsmith.com/2024/03/25/on-attunement On attunement]
*# [https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust Loving a world you don’t trust]
* Anthony Aguirre:
** [https://x.com/AnthonyNAguirre/status/1898023049930457468 2025-03]: [https://keepthefuturehuman.ai/ Keep The Future Human]
[[Image:GlchEeObwAQ88NK.jpeg|300px]]
* 2025-04: Scott Alexander (Astral Codex Ten): [https://www.astralcodexten.com/p/the-colors-of-her-coat The Colors Of Her Coat] (response to [https://www.theintrinsicperspective.com/p/welcome-to-the-semantic-apocalypse semantic apocalypse] and semantic satiation)
* 2025-05: Helen Toner: [https://www.ai-frontiers.org/articles/were-arguing-about-ai-safety-wrong We’re Arguing About AI Safety Wrong]: Dynamism vs. stasis is a clearer lens for criticizing controversial AI safety prescriptions
* 2025-05: Joe Carlsmith: [https://joecarlsmith.substack.com/p/the-stakes-of-ai-moral-status The stakes of AI moral status]

==Research==
* 2025-05: [https://www.lesswrong.com/posts/GAv4DRGyDHe2orvwB/gradual-disempowerment-concrete-research-projects Gradual Disempowerment: Concrete Research Projects]

==Alignment==
* 2023-03: Leopold Aschenbrenner: [https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/ Nobody’s on the ball on AGI alignment]
* 2024-03: [https://static1.squarespace.com/static/65392ca578eee444c445c9de/t/6606f95edb20e8118074a344/1711733370985/human-values-and-alignment-29MAR2024.pdf What are human values, and how do we align AI to them?] ([https://meaningalignment.substack.com/p/0480e023-98c0-4633-a604-990d3ac880ac blog])
* 2025: Joe Carlsmith: [https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem How do we solve the alignment problem?] Introduction to an essay series on paths to safe, useful superintelligence
*# [https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment What is it to solve the alignment problem?] Also: to avoid it? Handle it? Solve it forever? Solve it completely? ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16617671-what-is-it-to-solve-the-alignment-problem audio version])
*# [https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power When should we worry about AI power-seeking?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16651469-when-should-we-worry-about-ai-power-seeking audio version])
*# [https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety Paths and waystations in AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16768804-paths-and-waystations-in-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/ai-for-ai-safety AI for AI safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/16790183-ai-for-ai-safety audio version])
*# [https://joecarlsmith.substack.com/p/can-we-safely-automate-alignment Can we safely automate alignment research?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17069901-can-we-safely-automate-alignment-research audio version], [https://joecarlsmith.substack.com/p/video-and-transcript-of-talk-on-automating?utm_source=post-email-title&publication_id=1022275&post_id=162375391&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email video version])
*# [https://joecarlsmith.substack.com/p/giving-ais-safe-motivations?utm_source=post-email-title&publication_id=1022275&post_id=171250683&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email Giving AIs safe motivations] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17686921-giving-ais-safe-motivations audio version])
*# [https://joecarlsmith.com/2025/09/29/controlling-the-options-ais-can-pursue Controlling the options AIs can pursue] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/17909401-controlling-the-options-ais-can-pursue audio version])
*# [https://joecarlsmith.substack.com/p/how-human-like-do-safe-ai-motivations?utm_source=post-email-title&publication_id=1022275&post_id=178666988&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email How human-like do safe AI motivations need to be?] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18175429-how-human-like-do-safe-ai-motivations-need-to-be audio version])
*# [https://joecarlsmith.substack.com/p/building-ais-that-do-human-like-philosophy Building AIs that do human-like philosophy: AIs will face philosophical questions humans can't answer for them] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18591342-building-ais-that-do-human-like-philosophy audio version])
*# [https://joecarlsmith.substack.com/p/on-restraining-ai-development-for?utm_source=post-email-title&publication_id=1022275&post_id=191385185&utm_campaign=email-post-title&isFreemail=true&r=5av1bk&triedRedirect=true&utm_medium=email On restraining AI development for the sake of safety] ([https://joecarlsmithaudio.buzzsprout.com/2034731/episodes/18869440-on-restraining-ai-development-for-the-sake-of-safety audio version])
* 2025-04: Dario Amodei: [https://www.darioamodei.com/post/the-urgency-of-interpretability The Urgency of Interpretability]

==Strategic/Technical==
* 2025-03: [https://resilience.baulab.info/docs/AI_Action_Plan_RFI.pdf AI Dominance Requires Interpretability and Standards for Transparency and Security]
* 2026-02: [https://www.gap-map.org/capabilities/?sort=bottlenecks Fundamental Development Gap Map v1.0]

==Strategic/Policy==
* 2015-03: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-2 Machine intelligence, part 2]
* 2019-07: Amanda Askell, Miles Brundage, Gillian Hadfield: [https://arxiv.org/abs/1907.04534 The Role of Cooperation in Responsible AI Development]
* 2025-03: Dan Hendrycks, Eric Schmidt, Alexandr Wang: [https://www.nationalsecurity.ai/ Superintelligence Strategy]
** [https://www.nationalsecurity.ai/chapter/executive-summary Executive Summary]
** [https://www.nationalsecurity.ai/chapter/introduction Introduction]
** [https://www.nationalsecurity.ai/chapter/ai-is-pivotal-for-national-security AI Is Pivotal for National Security]
** [https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim Deterrence with Mutual Assured AI Malfunction (MAIM)]
** [https://www.nationalsecurity.ai/chapter/nonproliferation Nonproliferation]
** [https://www.nationalsecurity.ai/chapter/competitiveness Competitiveness]
** [https://www.nationalsecurity.ai/chapter/conclusion Conclusion]
** [https://www.nationalsecurity.ai/chapter/appendix Appendix FAQs]
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human] ([https://keepthefuturehuman.ai/essay/ essay])
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late (video)] (2025)
**# Oversight: Registration required for training >1025 FLOP and inference >1019 FLOP/s (~1,000 B200 GPUs @ $25M). Build cryptographic licensing into hardware.
**# Computation Limits: Ban on training models >1027 FLOP or inference >1020 FLOP/s.
**# Strict Liability: Hold AI companies responsible for outcomes.
**# Tiered Regulation: Low regulation on tool-AI, strictest regulation on AGI (general, capable, autonomous systems).
* 2025-04: [https://x.com/deanwball Dean W. Ball]: [https://arxiv.org/abs/2504.11501 A Framework for the Private Governance of Frontier Artificial Intelligence]
* 2025-04: Helen Toner: [https://helentoner.substack.com/p/nonproliferation-is-the-wrong-approach?source=queue Nonproliferation is the wrong approach to AI misuse]
* 2025-04: MIRI: [https://techgov.intelligence.org/research/ai-governance-to-avoid-extinction AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions]
* 2025-05: [https://writing.antonleicht.me/p/the-new-ai-policy-frontier The New AI Policy Frontier]: Beyond the shortcomings of centralised control and alignment, a new school of thought on AI governance emerges. It still faces tricky politics.
* 2025-05: [https://uncpga.world/agi-uncpga-report/ AGI UNCPGA Report]: Governance of the Transition to Artificial General Intelligence (AGI) Urgent Considerations for the UN General Assembly: Report for the Council of Presidents of the United Nations General Assembly (UNCPGA)
* 2025-06: [https://writing.antonleicht.me/p/ai-and-jobs-politics-without-policy AI & Jobs: Politics without Policy] Political support mounts - for a policy platform that does not yet exist
* 2025-06: [https://x.com/littIeramblings Sarah Hastings-Woodhouse]: [https://drive.google.com/file/d/1mmdHBE6M2yiyL21-ctTuRLNH5xOFjqWm/view Safety Features for a Centralized AGI Project]
* 2025-07: [https://writing.antonleicht.me/p/a-moving-target A Moving Target] Why we might not be quite ready to comprehensively regulate AI, and why it matters
* 2025-07: [https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf Anthropic: Build AI in America] ([https://www.anthropic.com/news/build-ai-in-america blog])
* 2025-12: [https://asi-prevention.com/ How middle powers may prevent the development of artificial superintelligence]
* 2026-03: [https://humanstatement.org/ The Pro-Human AI Declaration]

==Restriction==
* 2024-05: OpenAI: [https://openai.com/index/reimagining-secure-infrastructure-for-advanced-ai/ Reimagining secure infrastructure for advanced AI] OpenAI calls for an evolution in infrastructure security to protect advanced AI
* 2025-07: MIRI: [https://arxiv.org/abs/2507.09801 Technical Requirements for Halting Dangerous AI Activities]

=See Also=
* [[AI safety]]

Science Agents

2026-03-23T17:23:46Z

KevinYager: /* Math */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]
* 2026-03: [https://epoch.ai/frontiermath/open-problems FrontierMath] problem: [https://epoch.ai/frontiermath/open-problems/ramsey-hypergraphs "A Ramsey-style Problem on Hypergraphs"] solved by Kevin Barreto and Liam Price using GPT-5.4 Pro

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI creativity

2026-03-23T17:19:47Z

KevinYager: /* Research */

=Research=
* 2024-01: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4686415 Creativity and AI]
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://arxiv.org/abs/2412.02980 Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
* 2025-03: Midjourney: [https://www.arxiv.org/abs/2503.17126 Modifying Large Language Model Post-Training for Diverse Creative Writing]
* 2025-04: [https://arxiv.org/abs/2504.12320 Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability]
* 2025-05: [https://arxiv.org/abs/2505.14442 Creative Preference Optimization]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]
* 2025-10: [https://arxiv.org/abs/2510.01171 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity]
* 2025-10: [https://arxiv.org/abs/2510.20635 Why Did Apple Fall To The Ground: Evaluating Curiosity In Large Language Model]
* 2025-10: [https://arxiv.org/abs/2510.22954 Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)]
* 2025-12: [https://www.nature.com/articles/s41562-025-02331-1 A large-scale comparison of divergent creativity in humans and large language models]
* 2026-01: [https://www.arxiv.org/abs/2601.01576 OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment]
* 2026-01: [https://www.nature.com/articles/s41598-025-25157-3 Divergent creativity in humans and large language models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]
* 2026-03: [https://gking.harvard.edu/quest Inducing Sustained Creativity and Diversity in Large Language Models]

=Benchmarks=
See: [[AI_benchmarks#Creativity| AI benchmarks > Creativity]]

=Collapse=
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2024-07: [https://arxiv.org/abs/2407.02209 Generative Monoculture in Large Language Models]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]
==Analysis==
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
==LLM==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2025-10: [https://arxiv.org/abs/2510.13928 LLMs Can Get "Brain Rot"!]
==Image Models==
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
==Solutions==
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]

=See Also=
* [[AI_benchmarks|AI benchmarks]] > [[AI_benchmarks#Assess_Specific_Attributes|Assess Specific Attributes]] > [[AI_benchmarks#Creativity|Creativity]]
* [[AI_and_Humans|AI and Humans]] > [[AI_and_Humans#AI_out-performs_humans|AI out-performs humans]] > [[AI_and_Humans#Creativity|Creativity]]
* [[AI_and_Humans|AI and Humans]] > [[AI_and_Humans#AI_improves_human_work|AI improves human work]] > Creativity

Science Agents

2026-03-23T17:16:21Z

KevinYager: /* Autonomous Ideation */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]
* 2026-03: [https://arxiv.org/abs/2603.14473 AI Can Learn Scientific Taste]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI and Humans

2026-03-23T17:14:13Z

KevinYager: /* Human well-being */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]
* 2026-03: Anthropic: [https://www.anthropic.com/features/81k-interviews What 81,000 people want from AI]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
* 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2026-03: [https://arxiv.org/abs/2603.15245 Practicing with Language Models Cultivates Human Empathic Communication]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

AI research trends

2026-03-23T17:11:22Z

KevinYager: /* Context Length */

=System 2 Reasoning=
See: [[Increasing AI Intelligence]]

=Memory=
==Reviews==
* 2024-04: [https://arxiv.org/abs/2404.13501 A Survey on the Memory Mechanism of Large Language Model based Agents]
* 2026-01: [https://arxiv.org/abs/2601.09113 The AI Hippocampus: How Far are We From Human Memory?]

==Big Ideas==
* 2026-02: [https://arxiv.org/abs/2602.07755 Learning to Continually Learn via Meta-learning Agentic Memory Designs]

==LLM Weights Memory==
* 2024-12: [https://arxiv.org/abs/2412.09764 Memory Layers at Scale]
* 2025-10: [https://arxiv.org/abs/2510.15103 Continual Learning via Sparse Memory Finetuning]
* 2026-01: [https://developer.nvidia.com/blog/reimagining-llm-memory-using-context-as-training-data-unlocks-models-that-learn-at-test-time/ Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time] (Nvidia)
* 2026-01: [https://arxiv.org/abs/2601.02151 Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting]
* 2026-02: Sakana AI: [https://pub.sakana.ai/doc-to-lora/ Instant LLM Updates]: Train hypernetwork to generate LoRA adapters on the fly
** 2026-02: [https://arxiv.org/abs/2602.15902 Doc-to-LoRA: Learning to Instantly Internalize Contexts] ([https://github.com/SakanaAI/Doc-to-LoRA code])
** 2025-06: [https://arxiv.org/abs/2506.06105 Text-to-LoRA: Instant Transformer Adaption] ([https://github.com/SakanaAI/Text-to-LoRA])

==Context Length==
* 2020: [https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html Various ideas] for scaling context window, including [https://arxiv.org/abs/2004.05150 Longformer]
* 2023-04-02: [https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning Discussion] of ideas for how to scale context window
* 2023-05-11: Anthropic announces 100k window
* 2023-06-07: [https://magic.dev/ magic.dev] claims [https://magic.dev/blog/ltm-1 5M tokens coming soon]
* 2023-07-05: Microsoft describes [https://arxiv.org/abs/2307.02486 LongNet], with 1 billion token window
* 2023-07-11: [https://arxiv.org/abs/2307.03170 Focused Transformer] 256k
* 2023-11-06: [https://openai.com/blog/new-models-and-developer-products-announced-at-devday GPT-4 turbo] 128k
* 2023-11-22: [https://techcrunch.com/2023/11/21/anthropic-claude-2-1/ Anthropic Claude 2.1] 200k
* 2023-12-13: [https://arxiv.org/abs/2312.00752 Mamba] alternative
* 2024-01-04: [https://arxiv.org/abs/2401.01325 LongLM] to extend context window
* 2024-02-15: [https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#architecture Gemini 1.5] 1M tokens
* 2024-03-04: [https://www.anthropic.com/news/claude-3-family Anthropic Claude 3] 200k
* 2024-03-08: [https://arxiv.org/abs/2403.05530 Google claims] Gemini 1.5 can scale to 10M
* 2024-04-10: Google [https://arxiv.org/abs/2404.07143 preprint] demonstrates infinite context length by using compressive memory
* 2024-04-12: Meta et al. demonstrate [https://arxiv.org/abs/2404.08801 Megalodon] that enables infinite context via a more efficient architecture
* 2024-04-14: Google presents [https://arxiv.org/abs/2404.09173 TransformerFAM], which leverages a feedback loop so it attends to its own latent representations, acting as working memory and provides effectively infinite context
* 2024-10-31: [https://arxiv.org/abs/2410.23771 What is Wrong with Perplexity for Long-context Language Modeling?]
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01] 4M ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
* [https://x.com/Alibaba_Qwen/status/1883557964759654608 2025-01-27]: [https://qwenlm.github.io/blog/qwen2.5-1m/ Qwen2.5-1M] ([https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf report])
* 2025-02-14: [https://arxiv.org/abs/2502.08910 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU] 3M
* [https://x.com/AnimaAnandkumar/status/1897449851941744648 2025-02-18]: [https://arxiv.org/abs/2502.12574 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading] ([https://github.com/wdlctc/headinfer code])
* 2025-02-18: [https://arxiv.org/abs/2502.12962 Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing]
* 2025-02-19: [https://github.com/MoonshotAI/MoBA MoBA: Mixture of Block Attention for Long-Context LLMs]
* 2025-02-27: [https://arxiv.org/abs/2502.20082 LongRoPE2: Near-Lossless LLM Context Window Scaling] ([https://github.com/microsoft/LongRoPE code])
* [https://x.com/sundarpichai/status/1904579419496386736 2025-03-25]: [https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ Gemini 2.5 Pro] [https://x.com/pvncher/status/1904685092053606715 1M]
* 2025-04-05: Meta [https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4] 10M
* 2025-04-14: OpenAI [https://openai.com/index/gpt-4-1/ GPT-4.1] 1M
* 2025-12-04: Google [https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/ Titans/Miras] 10M
* 2025-12-13: [https://arxiv.org/abs/2512.12167 Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings]
* 2026-03-18: [https://github.com/EverMind-AI/MSA/blob/main/paper/MSA__Memory_Sparse_Attention_for_Efficient_End_to_End_Memory_Model_Scaling_to_100M_Tokens.pdf MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens] ([https://github.com/EverMind-AI/MSA code]) 100M

==Extended Context==
* 2025-01: [https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test Time]

==Context Remaking==
* 2021-01: [https://arxiv.org/abs/2101.00436 Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval]
* 2025-08: [https://blog.plasticlabs.ai/blog/Memory-as-Reasoning Memory as Reasoning (Memory is Prediction)]
* 2025-09: [https://arxiv.org/abs/2509.25140 ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory]
* 2025-10: [https://arxiv.org/abs/2510.04618 Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models]
* 2025-12: [https://arxiv.org/abs/2512.24601 Recursive Language Models] (model searches/queries the full context)
* 2026-01: [https://arxiv.org/abs/2601.02553 SimpleMem: Efficient Lifelong Memory for LLM Agents]
* 2026-01: [https://arxiv.org/abs/2601.07190 Active Context Compression: Autonomous Memory Management in LLM Agents]

==Retrieval beyond RAG==
See also: [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|AI tools: Retrieval Augmented Generation (RAG)]]
* 2024-10: Microsoft: [https://arxiv.org/abs/2410.10450 KBLaM: Knowledge Base augmented Language Model]
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]
* 2024-12: [https://arxiv.org/abs/2412.11919 RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation]
* 2025-03: Microsoft: [https://www.microsoft.com/en-us/research/blog/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms/ Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs]
* 2025-06: [https://arxiv.org/abs/2506.06266 Cartridges: Lightweight and general-purpose long context representations via self-study]
* 2025-07: [https://arxiv.org/pdf/2507.07957 MIRIX: Multi-Agent Memory System for LLM-Based Agents] ([https://mirix.io/ mirix])
* 2025-08: [https://arxiv.org/abs/2508.16153 Memento: Fine-tuning LLM Agents without Fine-tuning LLMs]

==Working Memory==
* 2024-12: [https://www.arxiv.org/abs/2412.18069 Improving Factuality with Explicit Working Memory]
* 2026-01: [https://arxiv.org/abs/2601.03192 MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory]

==Long-Term Memory==
* 2025-04: [https://arxiv.org/abs/2504.19413 Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory]

* 2025-12: Google [https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/ Titans + Miras]
** [https://arxiv.org/abs/2504.13173 It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization]
** [https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test Time]

===Storage and Retrieval===
* 2025-09: [https://arxiv.org/abs/2509.04439 ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory]
* 2026-01: [https://www.arxiv.org/abs/2601.07372 Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models]

===Episodic Memory===
* 2024-03: [https://arxiv.org/abs/2403.11901 Larimar: Large Language Models with Episodic Memory Control]
* 2025-08: [https://arxiv.org/abs/2508.16153 AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs]

==Continual Learning==
* 2022-02: [https://arxiv.org/abs/2202.00275 Architecture Matters in Continual Learning]
* 2025-10: [https://arxiv.org/abs/2510.15103 Continual Learning via Sparse Memory Finetuning]
* 2025-11: [https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ Introducing Nested Learning: A new ML paradigm for continual learning]
* 2026-01: [https://arxiv.org/abs/2601.16175 Learning to Discover at Test Time]
* 2026-01: [https://arxiv.org/abs/2601.19897 Self-Distillation Enables Continual Learning]
* 2026-02: [https://arxiv.org/abs/2602.07755 Learning to Continually Learn via Meta-learning Agentic Memory Designs]

=Updating Weights at Inference-time=
* 2025-01: [https://arxiv.org/abs/2501.06252 Transformer2: Self-adaptive LLMs]
* 2025-08: [https://arxiv.org/abs/2508.14143 Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation]
* 2026-02: Sakana AI: [https://pub.sakana.ai/doc-to-lora/ Instant LLM Updates]: Train hypernetwork to generate LoRA adapters on the fly
** 2026-02: [https://arxiv.org/abs/2602.15902 Doc-to-LoRA: Learning to Instantly Internalize Contexts] ([https://github.com/SakanaAI/Doc-to-LoRA code])
** 2025-06: [https://arxiv.org/abs/2506.06105 Text-to-LoRA: Instant Transformer Adaption] ([https://github.com/SakanaAI/Text-to-LoRA])

==Parameters as Tokens==
* 2024-10: [https://arxiv.org/abs/2410.23168 TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters] ([https://github.com/Haiyang-W/TokenFormer code])

=Internal Thought Representation Space=
==Visual Thinking==
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
* 2025-01: [https://arxiv.org/abs/2501.07542 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought]

==Neural (non-token) Latent Representation==
* 2024-11: Microsoft: [https://arxiv.org/abs/2411.02820 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving]: LLMs invent their own inter-communication language
* 2024-12: Meta: [https://arxiv.org/abs/2412.06769 Training Large Language Models to Reason in a Continuous Latent Space]: feeding the latent representation directly back into the model, instead of tokenizing intermediate thoughts (Chain of Continuous Thought, a.k.a. Coconut)
* 2024-12: Meta: [https://arxiv.org/abs/2412.08821 Large Concept Models: Language Modeling in a Sentence Representation Space]: train a model that operates at a higher level of abstraction than typical word/token LLMs; model operates in a space of concept embeddings (more akin to full sentences than individual words)
* 2024-12: Meta: [https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/ Byte Latent Transformer: Patches Scale Better Than Tokens]: Instead of tokenization, dynamically convert input byte-stream into patches, yielding gains in compute efficiency, with minimal loss in performance
* 2024-12: [https://arxiv.org/abs/2412.13171 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations]
* 2024-12: Google DeepMind: [https://arxiv.org/abs/2412.17747 Deliberation in Latent Space via Differentiable Cache Augmentation]
* 2024-12: [https://github.com/jerber/lang-jepa LANG-JEPA: Learning to Think in Latent Space]
* 2025-01: [https://arxiv.org/abs/2501.19201 Efficient Reasoning with Hidden Thinking] ([https://github.com/shawnricecake/Heima code])
* 2025-02: [https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125]: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://github.com/seal-rg/recurrent-pretraining code], [https://huggingface.co/tomg-group-umd/huginn-0125 model])
* 2025-02: Meta: [https://arxiv.org/abs/2502.08524 LLM Pretraining with Continuous Concepts] (CoCoMix)
* 2025-06: [https://arxiv.org/abs/2505.12514 Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought]

=Altered Transformer=

==Tokenization==
* 2024-04: [https://arxiv.org/abs/2404.19737 Better & Faster Large Language Models via Multi-token Prediction]
* 2024-12: [https://arxiv.org/abs/2412.06676 I Don't Know: Explicit Modeling of Uncertainty with an <nowiki>[IDK]</nowiki> Token]
* 2025-04: Meta: [https://arxiv.org/abs/2504.00927 Multi-Token Attention]

==Generation Order==
* 2019-02: [https://arxiv.org/abs/1902.02192 Non-Monotonic Sequential Text Generation]
* 2019-04: [https://arxiv.org/abs/1904.09324 Mask-Predict: Parallel Decoding of Conditional Masked Language Models]
* 2019-06: [https://arxiv.org/abs/1906.09601 Sequence Generation: From Both Sides to the Middle]
* 2020-04: [https://arxiv.org/abs/2004.11579 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order]
* 2021-12: [https://arxiv.org/abs/2112.10543 Spiral Language Modeling]
* 2023-10: [https://arxiv.org/abs/2310.09930 FiLM: Fill-in Language Models for Any-Order Generation]
* 2024-07: [https://arxiv.org/abs/2407.03582 Integrating Randomness in Large Language Models: A Linear Congruential Generator Approach for Generating Clinically Relevant Content]

==Diffusion Language Models==
* 2024-02: [https://arxiv.org/abs/2402.03687 Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation]
* 2025-02: [https://arxiv.org/abs/2502.09992 Large Language Diffusion Models]
* 2025-02: [https://www.inceptionlabs.ai/ Inception Labs] [https://www.inceptionlabs.ai/news Mercury] model ([https://chat.inceptionlabs.ai/ online demo])
* 2025-03: [https://arxiv.org/abs/2503.09573 Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models] ([https://m-arriola.com/bd3lms/ project], [https://github.com/kuleshov-group/bd3lms code], [https://huggingface.co/collections/kuleshov-group/bd3-lms-67be95f81b96b15fec50d53f hf])
* 2025-04: [https://hkunlp.github.io/blog/2025/dream/ Dream 7B: Introducing Dream 7B, the most powerful open diffusion large language model to date]
* 2025-04: [https://dllm-reasoning.github.io/d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning] ([https://dllm-reasoning.github.io/media/preprint.pdf preprint], [https://github.com/dllm-reasoning/d1 code])
* 2025-06: [https://arxiv.org/abs/2506.01928 Esoteric Language Models] ([https://s-sahoo.com/Eso-LMs/ project])

===Related: Image Synthesis via Autoregression/Diffusion===
* 2023-10: [https://arxiv.org/abs/2310.01400 Sequential Data Generation with Groupwise Diffusion Process]
* 2024-02: [https://arxiv.org/abs/2402.09470 Rolling Diffusion Models]
* 2024-08: [https://arxiv.org/abs/2408.11039 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model]

==Sampling==
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.01104 softmax is not enough (for sharp out-of-distribution)]
* 2025-06: [https://arxiv.org/abs/2506.06215 Corrector Sampling in Language Models]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-08: [https://arxiv.org/abs/2508.15260 Deep Think with Confidence] ([https://jiaweizzhao.github.io/deepconf/ project])
* 2025-10: [https://arxiv.org/abs/2510.14901 Reasoning with Sampling: Your Base Model is Smarter Than You Think]

=Daydreaming, brainstorming, pre-generation=
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
* 2025-07: Gwern: [https://gwern.net/ai-daydreaming Daydreaming]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]

'''Pre-generation'''
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
* 2025-11: [https://inference.net/blog/project-aella Project OSSAS: Custom LLMs to process 100 Million Research Papers] ([https://huggingface.co/inference-net models], [https://aella.inference.net/embeddings visualization])

=Missing Elements=
* Memory
* Continuous learning/update
* Robust contextual model
* Long-time-horizon coherence
* Fluid intelligence
* Agency
* Modeling of self
* [https://gwern.net/ai-daydreaming Daydreaming]

=Memes=
* Andrej Karpathy:
** 2015-05: "Hallucination" in [https://karpathy.github.io/2015/05/21/rnn-effectiveness/ The Unreasonable Effectiveness of Recurrent Neural Networks]
** 2017-11: [https://karpathy.medium.com/software-2-0-a64152b37c35 Software 2.0] ([https://x.com/karpathy/status/893576281375219712 "Gradient descent can write code better than you. I'm sorry."])
** 2022-10: [https://x.com/karpathy/status/1582807367988654081 Transformers as general-purpose differentiable computers] ([https://www.youtube.com/watch?v=9uw3F6rndnA talk])
** 2023-01: [https://x.com/karpathy/status/1617979122625712128 The hottest new programming language is English]
** 2023-09: [https://x.com/karpathy/status/1707437820045062561 LLM as kernel of a new Operating System] ([https://x.com/karpathy/status/1723140519554105733 diagram], [https://www.threads.com/@karpathy/post/CzehPtxPEF3 OS analogies])
** 2024-07: [https://x.com/karpathy/status/1816531576228053133 Jagged Intelligence] (c.f. [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Mollick paper])
** 2025-02: [https://x.com/karpathy/status/1886192184808149383 Vibe coding]
** 2025-06: [https://www.latent.space/p/s3 Software 3.0] ([https://www.youtube.com/watch?v=LCEmiRjPEtQ&t=1s talk]): "Prompts as Programs". Software 1.0 is code; 2.0 is model weights; 3.0 is prompts.
** 2025-06: [https://x.com/karpathy/status/1937902205765607626 "Context Engineering" instead of "Prompt Engineering"]
** 2025-06: [https://x.com/karpathy/status/1938626382248149433 LLMs as "cognitive cores"]
** 2025-11: [https://x.com/karpathy/status/1990116666194456651?s=20 Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.]
** 2026-01: [https://x.com/karpathy/status/2008664551445963083?s=20 The majority of the ruff ruff is people who look at the current point and people who look at the current slope]
** 2026-02: [https://x.com/karpathy/status/2019137879310836075 Agentic Engineering]

=See Also=
* [[Increasing AI Intelligence]]

AI research trends

2026-03-23T17:08:40Z

KevinYager: /* Context Length */

=System 2 Reasoning=
See: [[Increasing AI Intelligence]]

=Memory=
==Reviews==
* 2024-04: [https://arxiv.org/abs/2404.13501 A Survey on the Memory Mechanism of Large Language Model based Agents]
* 2026-01: [https://arxiv.org/abs/2601.09113 The AI Hippocampus: How Far are We From Human Memory?]

==Big Ideas==
* 2026-02: [https://arxiv.org/abs/2602.07755 Learning to Continually Learn via Meta-learning Agentic Memory Designs]

==LLM Weights Memory==
* 2024-12: [https://arxiv.org/abs/2412.09764 Memory Layers at Scale]
* 2025-10: [https://arxiv.org/abs/2510.15103 Continual Learning via Sparse Memory Finetuning]
* 2026-01: [https://developer.nvidia.com/blog/reimagining-llm-memory-using-context-as-training-data-unlocks-models-that-learn-at-test-time/ Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time] (Nvidia)
* 2026-01: [https://arxiv.org/abs/2601.02151 Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting]
* 2026-02: Sakana AI: [https://pub.sakana.ai/doc-to-lora/ Instant LLM Updates]: Train hypernetwork to generate LoRA adapters on the fly
** 2026-02: [https://arxiv.org/abs/2602.15902 Doc-to-LoRA: Learning to Instantly Internalize Contexts] ([https://github.com/SakanaAI/Doc-to-LoRA code])
** 2025-06: [https://arxiv.org/abs/2506.06105 Text-to-LoRA: Instant Transformer Adaption] ([https://github.com/SakanaAI/Text-to-LoRA])

==Context Length==
* 2020: [https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html Various ideas] for scaling context window, including [https://arxiv.org/abs/2004.05150 Longformer]
* 2023-04-02: [https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning Discussion] of ideas for how to scale context window
* 2023-05-11: Anthropic announces 100k window
* 2023-06-07: [https://magic.dev/ magic.dev] claims [https://magic.dev/blog/ltm-1 5M tokens coming soon]
* 2023-07-05: Microsoft describes [https://arxiv.org/abs/2307.02486 LongNet], with 1 billion token window
* 2023-07-11: [https://arxiv.org/abs/2307.03170 Focused Transformer] 256k
* 2023-11-06: [https://openai.com/blog/new-models-and-developer-products-announced-at-devday GPT-4 turbo] 128k
* 2023-11-22: [https://techcrunch.com/2023/11/21/anthropic-claude-2-1/ Anthropic Claude 2.1] 200k
* 2023-12-13: [https://arxiv.org/abs/2312.00752 Mamba] alternative
* 2024-01-04: [https://arxiv.org/abs/2401.01325 LongLM] to extend context window
* 2024-02-15: [https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#architecture Gemini 1.5] 1M tokens
* 2024-03-04: [https://www.anthropic.com/news/claude-3-family Anthropic Claude 3] 200k
* 2024-03-08: [https://arxiv.org/abs/2403.05530 Google claims] Gemini 1.5 can scale to 10M
* 2024-04-10: Google [https://arxiv.org/abs/2404.07143 preprint] demonstrates infinite context length by using compressive memory
* 2024-04-12: Meta et al. demonstrate [https://arxiv.org/abs/2404.08801 Megalodon] that enables infinite context via a more efficient architecture
* 2024-04-14: Google presents [https://arxiv.org/abs/2404.09173 TransformerFAM], which leverages a feedback loop so it attends to its own latent representations, acting as working memory and provides effectively infinite context
* 2024-10-31: [https://arxiv.org/abs/2410.23771 What is Wrong with Perplexity for Long-context Language Modeling?]
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01] 4M ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
* [https://x.com/Alibaba_Qwen/status/1883557964759654608 2025-01-27]: [https://qwenlm.github.io/blog/qwen2.5-1m/ Qwen2.5-1M] ([https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf report])
* 2025-02-14: [https://arxiv.org/abs/2502.08910 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU] 3M
* [https://x.com/AnimaAnandkumar/status/1897449851941744648 2025-02-18]: [https://arxiv.org/abs/2502.12574 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading] ([https://github.com/wdlctc/headinfer code])
* 2025-02-18: [https://arxiv.org/abs/2502.12962 Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing]
* 2025-02-19: [https://github.com/MoonshotAI/MoBA MoBA: Mixture of Block Attention for Long-Context LLMs]
* 2025-02-27: [https://arxiv.org/abs/2502.20082 LongRoPE2: Near-Lossless LLM Context Window Scaling] ([https://github.com/microsoft/LongRoPE code])
* [https://x.com/sundarpichai/status/1904579419496386736 2025-03-25]: [https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ Gemini 2.5 Pro] [https://x.com/pvncher/status/1904685092053606715 1M]
* 2025-04-05: Meta [https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4] 10M
* 2025-04-14: OpenAI [https://openai.com/index/gpt-4-1/ GPT-4.1] 1M
* 2025-12-04: Google [https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/ Titans/Miras] 10M
* 2025-12-13: [https://arxiv.org/abs/2512.12167 Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings]
* 2026-03-18: [https://github.com/EverMind-AI/MSA MSA: Memory Sparse Attention] 100M

==Extended Context==
* 2025-01: [https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test Time]

==Context Remaking==
* 2021-01: [https://arxiv.org/abs/2101.00436 Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval]
* 2025-08: [https://blog.plasticlabs.ai/blog/Memory-as-Reasoning Memory as Reasoning (Memory is Prediction)]
* 2025-09: [https://arxiv.org/abs/2509.25140 ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory]
* 2025-10: [https://arxiv.org/abs/2510.04618 Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models]
* 2025-12: [https://arxiv.org/abs/2512.24601 Recursive Language Models] (model searches/queries the full context)
* 2026-01: [https://arxiv.org/abs/2601.02553 SimpleMem: Efficient Lifelong Memory for LLM Agents]
* 2026-01: [https://arxiv.org/abs/2601.07190 Active Context Compression: Autonomous Memory Management in LLM Agents]

==Retrieval beyond RAG==
See also: [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|AI tools: Retrieval Augmented Generation (RAG)]]
* 2024-10: Microsoft: [https://arxiv.org/abs/2410.10450 KBLaM: Knowledge Base augmented Language Model]
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]
* 2024-12: [https://arxiv.org/abs/2412.11919 RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation]
* 2025-03: Microsoft: [https://www.microsoft.com/en-us/research/blog/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms/ Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs]
* 2025-06: [https://arxiv.org/abs/2506.06266 Cartridges: Lightweight and general-purpose long context representations via self-study]
* 2025-07: [https://arxiv.org/pdf/2507.07957 MIRIX: Multi-Agent Memory System for LLM-Based Agents] ([https://mirix.io/ mirix])
* 2025-08: [https://arxiv.org/abs/2508.16153 Memento: Fine-tuning LLM Agents without Fine-tuning LLMs]

==Working Memory==
* 2024-12: [https://www.arxiv.org/abs/2412.18069 Improving Factuality with Explicit Working Memory]
* 2026-01: [https://arxiv.org/abs/2601.03192 MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory]

==Long-Term Memory==
* 2025-04: [https://arxiv.org/abs/2504.19413 Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory]

* 2025-12: Google [https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/ Titans + Miras]
** [https://arxiv.org/abs/2504.13173 It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization]
** [https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test Time]

===Storage and Retrieval===
* 2025-09: [https://arxiv.org/abs/2509.04439 ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory]
* 2026-01: [https://www.arxiv.org/abs/2601.07372 Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models]

===Episodic Memory===
* 2024-03: [https://arxiv.org/abs/2403.11901 Larimar: Large Language Models with Episodic Memory Control]
* 2025-08: [https://arxiv.org/abs/2508.16153 AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs]

==Continual Learning==
* 2022-02: [https://arxiv.org/abs/2202.00275 Architecture Matters in Continual Learning]
* 2025-10: [https://arxiv.org/abs/2510.15103 Continual Learning via Sparse Memory Finetuning]
* 2025-11: [https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ Introducing Nested Learning: A new ML paradigm for continual learning]
* 2026-01: [https://arxiv.org/abs/2601.16175 Learning to Discover at Test Time]
* 2026-01: [https://arxiv.org/abs/2601.19897 Self-Distillation Enables Continual Learning]
* 2026-02: [https://arxiv.org/abs/2602.07755 Learning to Continually Learn via Meta-learning Agentic Memory Designs]

=Updating Weights at Inference-time=
* 2025-01: [https://arxiv.org/abs/2501.06252 Transformer2: Self-adaptive LLMs]
* 2025-08: [https://arxiv.org/abs/2508.14143 Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation]
* 2026-02: Sakana AI: [https://pub.sakana.ai/doc-to-lora/ Instant LLM Updates]: Train hypernetwork to generate LoRA adapters on the fly
** 2026-02: [https://arxiv.org/abs/2602.15902 Doc-to-LoRA: Learning to Instantly Internalize Contexts] ([https://github.com/SakanaAI/Doc-to-LoRA code])
** 2025-06: [https://arxiv.org/abs/2506.06105 Text-to-LoRA: Instant Transformer Adaption] ([https://github.com/SakanaAI/Text-to-LoRA])

==Parameters as Tokens==
* 2024-10: [https://arxiv.org/abs/2410.23168 TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters] ([https://github.com/Haiyang-W/TokenFormer code])

=Internal Thought Representation Space=
==Visual Thinking==
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
* 2025-01: [https://arxiv.org/abs/2501.07542 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought]

==Neural (non-token) Latent Representation==
* 2024-11: Microsoft: [https://arxiv.org/abs/2411.02820 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving]: LLMs invent their own inter-communication language
* 2024-12: Meta: [https://arxiv.org/abs/2412.06769 Training Large Language Models to Reason in a Continuous Latent Space]: feeding the latent representation directly back into the model, instead of tokenizing intermediate thoughts (Chain of Continuous Thought, a.k.a. Coconut)
* 2024-12: Meta: [https://arxiv.org/abs/2412.08821 Large Concept Models: Language Modeling in a Sentence Representation Space]: train a model that operates at a higher level of abstraction than typical word/token LLMs; model operates in a space of concept embeddings (more akin to full sentences than individual words)
* 2024-12: Meta: [https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/ Byte Latent Transformer: Patches Scale Better Than Tokens]: Instead of tokenization, dynamically convert input byte-stream into patches, yielding gains in compute efficiency, with minimal loss in performance
* 2024-12: [https://arxiv.org/abs/2412.13171 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations]
* 2024-12: Google DeepMind: [https://arxiv.org/abs/2412.17747 Deliberation in Latent Space via Differentiable Cache Augmentation]
* 2024-12: [https://github.com/jerber/lang-jepa LANG-JEPA: Learning to Think in Latent Space]
* 2025-01: [https://arxiv.org/abs/2501.19201 Efficient Reasoning with Hidden Thinking] ([https://github.com/shawnricecake/Heima code])
* 2025-02: [https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125]: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://github.com/seal-rg/recurrent-pretraining code], [https://huggingface.co/tomg-group-umd/huginn-0125 model])
* 2025-02: Meta: [https://arxiv.org/abs/2502.08524 LLM Pretraining with Continuous Concepts] (CoCoMix)
* 2025-06: [https://arxiv.org/abs/2505.12514 Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought]

=Altered Transformer=

==Tokenization==
* 2024-04: [https://arxiv.org/abs/2404.19737 Better & Faster Large Language Models via Multi-token Prediction]
* 2024-12: [https://arxiv.org/abs/2412.06676 I Don't Know: Explicit Modeling of Uncertainty with an <nowiki>[IDK]</nowiki> Token]
* 2025-04: Meta: [https://arxiv.org/abs/2504.00927 Multi-Token Attention]

==Generation Order==
* 2019-02: [https://arxiv.org/abs/1902.02192 Non-Monotonic Sequential Text Generation]
* 2019-04: [https://arxiv.org/abs/1904.09324 Mask-Predict: Parallel Decoding of Conditional Masked Language Models]
* 2019-06: [https://arxiv.org/abs/1906.09601 Sequence Generation: From Both Sides to the Middle]
* 2020-04: [https://arxiv.org/abs/2004.11579 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order]
* 2021-12: [https://arxiv.org/abs/2112.10543 Spiral Language Modeling]
* 2023-10: [https://arxiv.org/abs/2310.09930 FiLM: Fill-in Language Models for Any-Order Generation]
* 2024-07: [https://arxiv.org/abs/2407.03582 Integrating Randomness in Large Language Models: A Linear Congruential Generator Approach for Generating Clinically Relevant Content]

==Diffusion Language Models==
* 2024-02: [https://arxiv.org/abs/2402.03687 Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation]
* 2025-02: [https://arxiv.org/abs/2502.09992 Large Language Diffusion Models]
* 2025-02: [https://www.inceptionlabs.ai/ Inception Labs] [https://www.inceptionlabs.ai/news Mercury] model ([https://chat.inceptionlabs.ai/ online demo])
* 2025-03: [https://arxiv.org/abs/2503.09573 Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models] ([https://m-arriola.com/bd3lms/ project], [https://github.com/kuleshov-group/bd3lms code], [https://huggingface.co/collections/kuleshov-group/bd3-lms-67be95f81b96b15fec50d53f hf])
* 2025-04: [https://hkunlp.github.io/blog/2025/dream/ Dream 7B: Introducing Dream 7B, the most powerful open diffusion large language model to date]
* 2025-04: [https://dllm-reasoning.github.io/d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning] ([https://dllm-reasoning.github.io/media/preprint.pdf preprint], [https://github.com/dllm-reasoning/d1 code])
* 2025-06: [https://arxiv.org/abs/2506.01928 Esoteric Language Models] ([https://s-sahoo.com/Eso-LMs/ project])

===Related: Image Synthesis via Autoregression/Diffusion===
* 2023-10: [https://arxiv.org/abs/2310.01400 Sequential Data Generation with Groupwise Diffusion Process]
* 2024-02: [https://arxiv.org/abs/2402.09470 Rolling Diffusion Models]
* 2024-08: [https://arxiv.org/abs/2408.11039 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model]

==Sampling==
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.01104 softmax is not enough (for sharp out-of-distribution)]
* 2025-06: [https://arxiv.org/abs/2506.06215 Corrector Sampling in Language Models]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-08: [https://arxiv.org/abs/2508.15260 Deep Think with Confidence] ([https://jiaweizzhao.github.io/deepconf/ project])
* 2025-10: [https://arxiv.org/abs/2510.14901 Reasoning with Sampling: Your Base Model is Smarter Than You Think]

=Daydreaming, brainstorming, pre-generation=
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
* 2025-07: Gwern: [https://gwern.net/ai-daydreaming Daydreaming]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]

'''Pre-generation'''
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
* 2025-11: [https://inference.net/blog/project-aella Project OSSAS: Custom LLMs to process 100 Million Research Papers] ([https://huggingface.co/inference-net models], [https://aella.inference.net/embeddings visualization])

=Missing Elements=
* Memory
* Continuous learning/update
* Robust contextual model
* Long-time-horizon coherence
* Fluid intelligence
* Agency
* Modeling of self
* [https://gwern.net/ai-daydreaming Daydreaming]

=Memes=
* Andrej Karpathy:
** 2015-05: "Hallucination" in [https://karpathy.github.io/2015/05/21/rnn-effectiveness/ The Unreasonable Effectiveness of Recurrent Neural Networks]
** 2017-11: [https://karpathy.medium.com/software-2-0-a64152b37c35 Software 2.0] ([https://x.com/karpathy/status/893576281375219712 "Gradient descent can write code better than you. I'm sorry."])
** 2022-10: [https://x.com/karpathy/status/1582807367988654081 Transformers as general-purpose differentiable computers] ([https://www.youtube.com/watch?v=9uw3F6rndnA talk])
** 2023-01: [https://x.com/karpathy/status/1617979122625712128 The hottest new programming language is English]
** 2023-09: [https://x.com/karpathy/status/1707437820045062561 LLM as kernel of a new Operating System] ([https://x.com/karpathy/status/1723140519554105733 diagram], [https://www.threads.com/@karpathy/post/CzehPtxPEF3 OS analogies])
** 2024-07: [https://x.com/karpathy/status/1816531576228053133 Jagged Intelligence] (c.f. [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Mollick paper])
** 2025-02: [https://x.com/karpathy/status/1886192184808149383 Vibe coding]
** 2025-06: [https://www.latent.space/p/s3 Software 3.0] ([https://www.youtube.com/watch?v=LCEmiRjPEtQ&t=1s talk]): "Prompts as Programs". Software 1.0 is code; 2.0 is model weights; 3.0 is prompts.
** 2025-06: [https://x.com/karpathy/status/1937902205765607626 "Context Engineering" instead of "Prompt Engineering"]
** 2025-06: [https://x.com/karpathy/status/1938626382248149433 LLMs as "cognitive cores"]
** 2025-11: [https://x.com/karpathy/status/1990116666194456651?s=20 Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.]
** 2026-01: [https://x.com/karpathy/status/2008664551445963083?s=20 The majority of the ruff ruff is people who look at the current point and people who look at the current slope]
** 2026-02: [https://x.com/karpathy/status/2019137879310836075 Agentic Engineering]

=See Also=
* [[Increasing AI Intelligence]]

AI creativity

2026-03-23T17:07:23Z

KevinYager: /* Research */

=Research=
* 2024-01: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-01: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4686415 Creativity and AI]
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://arxiv.org/abs/2412.02980 Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
* 2025-03: Midjourney: [https://www.arxiv.org/abs/2503.17126 Modifying Large Language Model Post-Training for Diverse Creative Writing]
* 2025-04: [https://arxiv.org/abs/2504.12320 Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability]
* 2025-05: [https://arxiv.org/abs/2505.14442 Creative Preference Optimization]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]
* 2025-10: [https://arxiv.org/abs/2510.01171 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity]
* 2025-10: [https://arxiv.org/abs/2510.20635 Why Did Apple Fall To The Ground: Evaluating Curiosity In Large Language Model]
* 2025-10: [https://arxiv.org/abs/2510.22954 Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)]
* 2025-12: [https://www.nature.com/articles/s41562-025-02331-1 A large-scale comparison of divergent creativity in humans and large language models]
* 2026-01: [https://www.arxiv.org/abs/2601.01576 OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment]
* 2026-01: [https://www.nature.com/articles/s41598-025-25157-3 Divergent creativity in humans and large language models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]

=Benchmarks=
See: [[AI_benchmarks#Creativity| AI benchmarks > Creativity]]

=Collapse=
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2024-07: [https://arxiv.org/abs/2407.02209 Generative Monoculture in Large Language Models]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]
==Analysis==
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
==LLM==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2025-10: [https://arxiv.org/abs/2510.13928 LLMs Can Get "Brain Rot"!]
==Image Models==
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
==Solutions==
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]

=See Also=
* [[AI_benchmarks|AI benchmarks]] > [[AI_benchmarks#Assess_Specific_Attributes|Assess Specific Attributes]] > [[AI_benchmarks#Creativity|Creativity]]
* [[AI_and_Humans|AI and Humans]] > [[AI_and_Humans#AI_out-performs_humans|AI out-performs humans]] > [[AI_and_Humans#Creativity|Creativity]]
* [[AI_and_Humans|AI and Humans]] > [[AI_and_Humans#AI_improves_human_work|AI improves human work]] > Creativity

AI and Humans

2026-03-23T17:07:15Z

KevinYager: /* Creativity */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2026-03: [https://arxiv.org/abs/2603.19087 Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]
* 2026-03: Anthropic: [https://www.anthropic.com/features/81k-interviews What 81,000 people want from AI]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
* 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

AI video

2026-03-19T18:52:19Z

KevinYager: /* March 2026 */

==Evolution of Capabilities==
===Early===
* November 2016: [https://arxiv.org/abs/1611.10314 Sync-Draw]
* April 2021: [https://arxiv.org/abs/2104.14806 GODIVA]
* October 2022: [https://makeavideo.studio/ Meta Make-a-video]
* October 2022: [https://imagen.research.google/video/ Google Imagen video]

===2023===
* April 2023: [https://www.youtube.com/watch?v=XQr4Xklqzw8 Will Smith eating spaghetti]
* April 2023: [https://x.com/nickfloats/status/1642899094808002564 Harry Potter by Balenciaga]
* April 2023: [https://x.com/mrjonfinger/status/1645953033636048896?cxt=HHwWgMDT7YfkzNctAAAA Runway Gen 2]
* April 2023: [https://research.nvidia.com/labs/toronto-ai/VideoLDM/ Nvidia latents]
* December 2023: [https://www.threads.net/@luokai/post/C0vvEnTP4Oj Fei-Fei Li]

===2024===
====Early 2024====
* January 2024: [https://sites.research.google/videopoet/ Google VideoPoet]
* January 2024: [https://lumiere-video.github.io/ Google Lumiere]
* February 2024: [https://openai.com/index/sora/ OpenAI Sora]
* April 2024: [https://www.maginative.com/article/china-unveils-vidu-a-powerful-text-to-video-generator/ Vidu]
* May 2024: [https://deepmind.google/technologies/veo/ Veo]
* May 2024: [https://kling.kuaishou.com/ Kling]
* June 2024: [https://lumalabs.ai/dream-machine Luma DreamMachine]
* June 2024: [https://runwayml.com/research/introducing-gen-3-alpha RunwayML Gen-3 Alpha]
* July 2024: Examples:
** [https://www.youtube.com/watch?v=F_WfIzYGlg4 Toys-R-Us Commercial made using Sora]
** [https://www.youtube.com/watch?v=CSfw_NjqQ2o Motorola commercial made using genAI]
* July 2024: [https://x.com/rowancheung/status/1813258518159585723 haiper.ai]
====August 2024====
* August 2024: [http://hotshot.co/ Hotshot] ([https://x.com/maxescu/status/1825459083635536081 examples], [https://x.com/EccentrismArt/status/1825550841534972027 more examples])
* August 2024: Luma Dream Machine [https://x.com/LumaLabsAI/status/1825639918539817101 v1.5]
* August 2024: Examples:
** [https://x.com/endlesstaverns/status/1811276904692887815 Runway Gen3 music video]
** [https://x.com/runwayml/status/1820806644806070583 Runway Gen3 for adding FX to live action] ([https://x.com/bryanf0x/status/1825529998201004137 another example])
** [https://www.youtube.com/watch?v=taaM0s1bq7Q Midjourney + Runway Gen3: Hey It’s Snowing]
** [https://x.com/Kyrannio/status/1821605619927019974 Flux/LoRA image] + Runway Gen3 [https://x.com/iamneubert/status/1821970292014768420 woman presenter]
** [https://x.com/CharaspowerAI/status/1825274421256356106 McDonald’s AI commercial]
** Sora used by [https://www.facebook.com/izanamiaiart/ Izanami AI Art] to create [https://x.com/kimmonismus/status/1824102316229759114 dreamlike video] and by [https://x.com/alexiaadana Alexia Adana] to create [https://x.com/basedjensen/status/1824386717123743940 sci-fi film concept]
====September 2024====
* September 2024: [https://hailuoai.com/video/ Hailuo Minimax] ([https://x.com/minchoi/status/1829995683124035766 examples])
* September 2024: Examples:
** [https://www.youtube.com/watch?v=XAs5KuhfE_s Space colonization]
** [https://x.com/venturetwins/status/1827772646295265699 Consistent characters]
** [https://x.com/thealexbanks/status/1829489392354050502 Sea monsters]
** [https://x.com/ai_for_success/status/1829539535132426286 Music video]
** [https://x.com/RyanMorrisonJer/status/1829074823521112544 Animated characters]
** [https://x.com/CharaspowerAI/status/1829916782452191674 AI influencer]
** [https://x.com/minchoi/status/1829293248197902802 Ten short examples]
** [https://x.com/WorldEverett/status/1830596701473615937 Seven examples]
** [https://x.com/EccentrismArt/status/1830654805515395583 Clip from horror film]
** [https://x.com/MatthieuGB/status/1722146578813645296 "Gone" featuring astronaut] and [https://x.com/MatthieuGB/status/1742949297337852270 something ethereal]
** [https://x.com/kimmonismus/status/1831256663644373449 Two dancers] (surprisingly good consistency despite movement)
** [https://x.com/8bit_e/status/1831344542487871953 Music video about flying]
** [https://www.youtube.com/watch?v=_XtS_4PzEyk The Paperclip Maximizer]
** [https://x.com/trbdrk/status/1831801373517869369 La Baie Aréa]
** [https://www.reddit.com/r/aivideo/comments/1f8xr0w/gisele_tong_to_dear_me/ "To Dear Me" by Gisele Tong] ([https://www.morningstar.com/news/business-wire/20240904521664/reply-ai-film-festival-announced-the-winners-of-the-first-international-festival-for-short-films-made-with-artificial-intelligence winner of AI shorts] film festival)
** [https://x.com/maxescu/status/1833476640438964281 Various scenes]
** [https://x.com/EHuanglu/status/1833522650846793970 Directing emotions]
* September 2024: Kling 1.5 ([https://x.com/Uncanny_Harry/status/1836531835280724459 examples], [https://x.com/minchoi/status/1836800551469654088 showing emotions])
* September 2024: Examples:
** Runway video-to-video to [https://x.com/jon_barron/status/1835695132697604236 restyle classic video games]
** [https://x.com/ai_for_success/status/1835319670917796117 Realistic presenter]
** [https://x.com/kimmonismus/status/1834530744175059302 Skateboarding] (demonstrates getting closer to meaningfully simulating motion/physics)
** [https://x.com/minchoi/status/1835378029092049325 Examples] of short clips with cinematic feel
** Short: [https://x.com/PJaccetturo/status/1835670655330869633 4 Minutes to Live]
** Short: [https://x.com/dreamingtulpa/status/1836121321526432231 Neon Nights] (Arcade)
** [https://www.youtube.com/watch?v=CcrGSA-kSrI Random Access Memories]: AI-generated, but then projected onto Kodak film stock. Gives the final output some of the dreamy analog quality we associate with nostalgic footage
** Sora used to make a sort of [https://x.com/niceaunties/status/1837271244774715505 weird dreamlike video]
====October 2024====
* October 2024: Pika v1.5, including Pikaffects (explode, melt, inflate, and cake-ify; examples: [https://x.com/justin_hart/status/1841144350572413259 1], [https://x.com/arthur_hyper88/status/1841156544538521646 2], [https://x.com/ytjessie_/status/1841168925301842263 3], [https://x.com/bilawalsidhu/status/1841195247184781420 4], [https://x.com/minchoi/status/1841189035454447636 5], [https://x.com/ytjessie_/status/1841209415514669501 6])
* October 2024: Examples:
** [https://x.com/HalimAlrasihi/status/1839310216602788103 AI avatar with good lip-sync]
** [https://www.youtube.com/watch?v=5NZubOOeeV0 Battalion]: 5 minute short about war
** Short film: [https://x.com/MatthieuGB/status/1841173724688536015 To Wonderland] ([https://x.com/MatthieuGB/status/1841174221550207437 credits])
** [https://x.com/OnwardsProject/status/1841508441241890975 9 to 5]: Created with Luma Dream Machine keyframes and camera features; music by Suno
* October 2024: [https://ai.meta.com/research/movie-gen/ Meta Movie Gen]
* October 2024: Examples:
** [https://x.com/CuriousRefuge/status/1844424871335592373 AI Avatar] (using [https://x.com/CuriousRefuge/status/1844424871335592373 HeyGen])
** [https://www.youtube.com/watch?v=isW1FLL0K3w Generic Movies]
** [https://arxiv.org/abs/2410.05954 Pyramid-flow] ([https://huggingface.co/rain1011/pyramid-flow-sd3 open source]) model: [https://x.com/_akhaliq/status/1844239643778351605 examples]
** [https://x.com/whrumorvid/status/1846209247467491604 Building the Pyramids]
** [https://x.com/maxescu/status/1844716998854349217 People showing realistic emotion] (using [https://hailuoai.video/ Hailuo AI])
** Keyframes and Luma AI to make novel [https://x.com/CoffeeVectors/status/1845188179332051005 speed-ramp motion]
* October 2024: [https://pollo.ai/ Pollo AI] platform offers selection among a diversity of video models
* October 2024: [https://www.genmo.ai/ Genmo] [https://x.com/genmoai/status/1848762405779574990 Mochi 1] (open source)
* October 2024: Examples:
** [https://x.com/AIatMeta/status/1849134463382680028 Meta Movie Gen examples]
** [https://x.com/PJaccetturo/status/1847732127598800960 Emotional range of Minimax]
** [https://x.com/dustinhollywood/status/1848757800807039299 Car commercial: Bear]
** [https://x.com/runwayml/status/1848785913918218517 Diner conversation]
** [https://x.com/Uncanny_Harry/status/1849275871716159989 Loved and Lost] (a meditation on grief)
====November 2024====
* November 2024: Examples:
** [https://x.com/blizaine/status/1852092147643699356 Pasta Doble]
** [https://x.com/kimmonismus/status/1852425015175626876 Bird protecting young]
** [https://x.com/runwayml/status/1852363190484537666 Camera moving around sushi]
** [https://x.com/StevieMac03/status/1851969120813629939 Various examples] of [https://hailuoai.video/ Hailuo AI]
** [https://x.com/kimmonismus/status/1853102779650252978 Trains]
** [https://www.youtube.com/watch?v=Fh-_g5vev0s Light of Imagination]
** [https://x.com/LinusEkenstam/status/1854087441122021814 Bringing historic images to life]
** [https://x.com/DeryaTR_/status/1855637066203218180 Plants dancing]
** [https://x.com/c_valenzuelab/status/1855078644042944574 Insect on tree]
** Trailers for [https://x.com/abandonedmovies/status/1827037378009296983 The Silmarillion] and [https://x.com/abandonedmovies/status/1846941183702110211 The Fall of Gondolin] (by [https://x.com/abandonedmovies Abandoned Films])
** [https://x.com/Diesol/status/1855475704470884427 Moody sci-fi]
** [https://x.com/runwayml/status/1857072173631885586 Migration] ([https://runwayml.com/customers/behind-the-scenes-of-migration-with-director-jeremy-higgins made by combining] Runway ML Gen3-Alpha and traditional animation)
** [https://x.com/AIandDesign/status/1856467856625676752 After the Winter] ([https://suno.com/song/0d6919de-d2bf-434b-8aa6-ede0fb0fde77 music] made using Suno v4)
** Horror: [https://www.reddit.com/r/aivideo/comments/1gnk27q/ridge_to_southwest/ Ridge to Southwest]
** [https://www.youtube.com/watch?v=ClStJZmIjBU The Gardener] (by [https://www.youtube.com/@MachineMythos Machine Mythos])
** [https://x.com/techhalla/status/1857462526859935813 Coca-Cola holiday ad] and [https://www.youtube.com/watch?v=THdoOgwqjBg parody thereof]
** [https://x.com/pzf_ai/status/1858312421510992111 A Dream Within A Dream] (by [https://x.com/pzf_ai PZF], selected for the Czech International AI Film Festival)
** [https://x.com/WorldEverett/status/1859273222597775843 Making Friends] (by [https://x.com/WorldEverett Everett World]; see also [https://x.com/WorldEverett/status/1858563716834275562 Childhood Dream] and [https://x.com/WorldEverett/status/1858945634067202429 City Echoes])
** Anime: [https://x.com/naegiko/status/1857754626742726893 test shots], [https://x.com/naegiko/status/1858978557424210401 Ultimate Ceremony], [https://x.com/naegiko/status/1835434668294074462 Echoes of Love]
** [https://x.com/KakuDrop/status/1866309309384323257 Echoes of Grace] ([https://x.com/KakuDrop KakuDrop] using Sora)
** [https://x.com/vibeke_udart/status/1859879367071203662 Morphing hands], [https://x.com/vibeke_udart/status/1858772719224975630 hands and faces] ([https://x.com/vibeke_udart Vibeke Bertelsen])
** [https://www.reddit.com/r/aivideo/comments/1gxi29x/dbzlicious/ Dragon Ball Z live action]
** [https://x.com/cfryant/status/1860727980353278386 Pitch Black] (abstract and dark)
** [https://x.com/cfryant/status/1861050528932765710 Animals Running] (zoomed-in ultra-wide camera)
** [https://x.com/WorldEverett/status/1860730214487118290 Dreams of Tomorrow] (panning shots of high-tech car, Scottish manor)
** [https://x.com/nickfloats/status/1861206978690691165 Desert Planet Cinematics]
* November 2024: [https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora Leaked] Sora turbo model; [https://x.com/rowancheung/status/1861455031603503234 examples], [https://x.com/chatgpt21/status/1861504511153451517 Dog chasing Cat in snow]
====December 2024====
* December 2024: Examples:
** [https://x.com/minchoi/status/1863243880553976235 Realistic] (Minimax by Hailuo AI)
** Trailer for [https://x.com/TheReelRobot/status/1861824847149670840 Paradise Lost] (to be released on [https://www.sandwatch.ai/ Sandwatch AI])
** [https://x.com/EHuanglu/status/1863607136271716418 Music video example] with consistent characters
** [https://x.com/venturetwins/status/1863666366764687581 Human expressions] ([https://www.reddit.com/r/ChatGPT/comments/1h4r13x/ai_generated_expressions/ u/Kind_Distance9504 on Reddit], using Hailuo)
** Vodafone ad: [https://www.youtube.com/watch?v=9AyEC_K9kBg The Rhythm Of Life]
** [https://www.reddit.com/r/midjourney/comments/1h5u2gw/we_made_a_10_minute_gen_ai_batman_film/ 10 minute Batman film]
* December 2024: Tencent [https://aivideo.hunyuan.tencent.com/ Hunyuan Video] open-source video model ([https://x.com/CharaspowerAI/status/1863862585554010530 example])
* December 2024: [https://sora.com/ Sora] release ([https://x.com/CharaspowerAI/status/1866203050982916532 examples])
* December 2024: [https://mint-video.github.io/ MinT video] improves consistency and control ([https://arxiv.org/abs/2412.05263 preprint], [https://x.com/EHuanglu/status/1868278456565531061 examples])
* December 2024: Google [https://blog.google/technology/google-labs/video-image-generation-update-december-2024/ Veo 2] ([https://x.com/sundarpichai/status/1868709099644334518 examples], [https://x.com/EHuanglu/status/1869008306322522342 more examples], [https://x.com/_Borriss_/status/1869267571532320966 natural movement examples], [https://x.com/jerrod_lew/status/1870816560027246715 abstract], [https://x.com/jerrod_lew/status/1869427407415058660 realistic physics], [https://x.com/jerrod_lew/status/1873096585002786944 crowds], [https://x.com/minchoi/status/1873590350515929380 dancing], [https://x.com/jerrod_lew/status/1874440442269565351 animals])
* December 2024: [https://x.com/pika_labs/status/1867651381840040304 Pika 2.0] with Scene Ingredients
* December 2024: Examples:
** [https://www.youtube.com/watch?v=c_kKKRQ5gYw Synthetic Youth: Takenoko Zoku · Made by Emi Kusano with Sora]
** [https://x.com/higgsfield_ai/status/1868698886761837041 Car race] ([https://higgsfield.ai/ Higgsfield AI] storytelling)
** [https://x.com/blizaine/status/1868850653759783033 Slicing meat]; comparison of modern video generators
** Challenging prompt: [https://x.com/RubenEVillegas/status/1868864410720325844 A cat roars while looking at its reflection in the mirror but instead sees itself as a lion roaring (Veo 2)] ([https://x.com/anukaakash/status/1869417975071330550 comparison to other models])
** [https://x.com/PJaccetturo/status/1869829338868412865 Anime trailer]
** [https://x.com/ring_hyacinth/status/1870386506776674376 Snorlax at Mount Fuji] and [https://x.com/ring_hyacinth/status/1871105733443592696 Psyduck at Colosseum] (Kling 1.6)
** [https://x.com/machine_mythos/status/1870565287789056320 Horror visuals] (with [https://mmaudio.net/ MMAudio] sound)
** [https://www.youtube.com/watch?v=lFc1jxLHhyM The Heist] (Veo 2)
** [https://x.com/minchoi/status/1871263616806129863 Various Veo 2 examples]
** [https://x.com/minchoi/status/1872390429108486320 Live Action Titans]
** [https://x.com/kimmonismus/status/1873094065841193222 Cats] [https://x.com/PostsOfCats/status/1872530207585825058 Cooking]
** Aesthetic from alternate timelines: [https://x.com/BrianRoemmele/status/1871753358782120068 1], [https://x.com/BrianRoemmele/status/1872105833456423216 2], [https://x.com/brain_racked/status/1872340717978390583 3]
** [https://x.com/minchoi/status/1872486717145706793 Examples approaching cinematic quality]
** [https://x.com/JaicSam/status/1872903054221033693 Cosmic Spider] (winner at AI film festival)
** [https://www.youtube.com/watch?v=dbdYPMRi_Nk Trailer for Newton's Cradle] (full film [https://x.com/JeffSynthesized/status/1872705173451358293 on] 2025-01-01)
** [https://x.com/Ror_Fly/status/1873036384077828499 Car vs. Jet drag race]
** [https://x.com/Diesol/status/1873415500149199066 California Monsters]
** [https://x.com/heyshrutimishra/status/1873631383584924078 Various examples] (Hailuo AI)
** [https://x.com/kimmonismus/status/1873568693357294014 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023])
** [https://x.com/StevieMac03/status/1873998177193648438 Sorceress and Arachnid Steed] (Kling v1.6)
** [https://x.com/dustinhollywood/status/1873940924016779425 Music video] (Hailuo AI)
** [https://www.youtube.com/watch?v=iQg2udCHMdI Akụkọ (Story)] (22 minute short) - A Lagos Boy's Thrilling Snack Run Nightmare
** [https://x.com/cinerobot/status/1873766976306455019 Son of the Dragon] (8 minute short)
** [https://x.com/SynthReveries/status/1873624586857886071 Endless Journey] music video ([https://suno.com/song/fa90fa5e-25c7-48ad-b291-42a8a8c51cf9 music] by Suno)
** [https://x.com/anukaakash/status/1870504167653228980 Once Again] (retrospective)
** [https://x.com/jasonzada/status/1873470586053414928 Fade Out] (Veo 2)
** [https://x.com/talkboysstudio/status/1869085014513865027 Roadkill] (12 minute short)

===2025===
====January 2025====
* January 2025: [https://x.com/kimmonismus/status/1877351050748871038 Progress] over the last 1.5 years, by comparing Runway Gen 2 and Veo 2.
* January 2025: Examples:
** [https://x.com/AllarHaltsonen/status/1874557865576542655 Delivery] (unofficial Nike ad)
** [https://x.com/Diesol/status/1875237221735002299 Gucci ad] (Sora)
** [https://x.com/EccentrismArt/status/1874498145910149412 Conquest]
** [https://www.youtube.com/watch?v=RJZCMfaS-io Newton's Cradle] (6 minute short)
** [https://x.com/AngryTomtweets/status/1874627041934602410 Singer]
** [https://x.com/DumpsterBud/status/1874807352794182019 Brain vomit] (music video)
** [https://x.com/mxvdxn/status/1874796628210778618 Vibe] (Kling v1.6)
** [https://x.com/_deepfates/status/1875215969452523785 Will Smith eating spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024])
** [https://www.youtube.com/watch?v=BL9-jHGnxyc Zorgop Knows All] (2 minute short)
** [https://x.com/ButchersBrain/status/1875130428518269406 The Breach] (2 minute short; Veo2, Runway ActOne, MMaudio)
** [https://x.com/Rainmaker1973c/status/1875261591043850477 Aesthetics from an alternate timeline]
** [https://x.com/StevieMac03/status/1875440611849072841 Immortal Awakens]
** [https://x.com/isaachorror/status/1875624519588835400 The Faded Line]
** [https://www.youtube.com/watch?v=4fy8H38rm-4 Dear Dad]
** [https://x.com/maxescu/status/1877060580680311242 Mad Max chase]
** [https://x.com/kimmonismus/status/1877408247906447633 Patience is Key]
** [https://x.com/techhalla/status/1879967230093586555 The Almost Famous Show] (talent show parody)
** [https://x.com/thefuzzysignal/status/1879295176990154755 Proof-of-concept trailer for a medieval adult animated series]
** [https://x.com/JeffSynthesized/status/1879555151499034869 Variety] (unofficial Cadbury ad)
** [https://x.com/henrydaubrez/status/1879883806947115446 Kitsune] (5 minute animated short, Veo 2)
* January 2025: MiniMax Hailuo [https://www.minimaxi.com/en/news/s2v-01-release Subject Reference] enables consistent characters ([https://x.com/minchoi/status/1881707687362412924 examples])
* January 2025: AI (de-aging deepfakes, [https://magnific.ai/ Magnific]) [https://x.com/JeffSynthesized/status/1878630652377178502 used in the film] [https://www.imdb.com/title/tt18272208/ "Here"].
* January 2025: Luma [https://lumalabs.ai/ray Ray2]
* January 2025: [https://pikartai.com/pika-2-1/ Pika 2.1] ([https://x.com/OrctonAI/status/1883925754653905049 examples])
* January 2025: Examples:
** [https://x.com/wyzborrero/status/1879949477764804873 Light projections onto people] (challenging task, Ray2)
** [https://x.com/AllarHaltsonen/status/1881261042753589547 BMW ad]
** [https://x.com/AIWarper/status/1880658326645878821 John Wick in Severance] (Hunyuan vid2vid)
** [https://x.com/TheReelRobot/status/1881771800595444193 Biopic] (7 minutes)
** [https://x.com/misslaidlaw/status/1882180619582791784 Give It To Me] (music video)
** [https://x.com/paultrillo/status/1882091702506459394 Where do we go from here?] (music video, Veo 2)
** [https://x.com/WorldEverett/status/1882235057076580502 Party like there's no tomorrow] (music video)
** [https://x.com/Diesol/status/1884696027942498779 S.T.O.R.I.] (Midjourney and Pika 2.1)
====February 2025====
* February 2025: Examples:
** [https://x.com/OrctonAI/status/1885839287913955597 Long Steampunk scene]
** [https://x.com/jerrod_lew/status/1885787580685562226 City destruction]
** [https://x.com/EHuanglu/status/1885736840344551763 Consistent character acting]
** [https://x.com/MeanOrangeCat/status/1884295241534185890 Kaiju Katastrophe] (by [https://x.com/MeanOrangeCat Mean Orange Cat])
** [https://x.com/Diesol/status/1886433799690748210 The Greyhound]
** [https://x.com/CoffeeVectors/status/1886146242029195391 Fluid simulation video2video]
** [https://x.com/toolstelegraph/status/1886622772828254403 High resolution macro shots]
** [https://www.youtube.com/watch?v=p0J1LDWERS0 Chrysalids]
** [https://x.com/multimodalart/status/1887817996220940737 Boring realistic images] (HunyuanVideo w/ LoRA)
** [https://www.youtube.com/watch?v=PcVRfa1JyyQ Anime intro] ([https://www.reddit.com/r/StableDiffusion/comments/1ijvua0/opensource_almostconsistent_real_anime_made_with/ Hunyuan w/ custom LoRAs])
** [https://x.com/AllarHaltsonen/status/1888294811750318114 Automotive ad test] (Kling w/ custom model)
** [https://x.com/AngryTomtweets/status/1888758524303269928 Random cinematic clips] (Midjourney and Kling)
** [https://x.com/juliewdesign_/status/1888666757302263828 Crossing Paths]
** [https://x.com/weirdai_art/status/1888794894187041200 Miniature food]
** [https://x.com/CaptainHaHaa/status/1889573017745035463 Animals]
** [https://x.com/Kavanthekid/status/1889371011667144724 Star Wars - The Ghost's Apprentice (Fan Film)]
** [https://x.com/AngryTomtweets/status/1889768184716423573 Ray2 image-to-video examples]
** [https://x.com/weirdai_art/status/1889890470987518069 New Horizons] (miniatures going to Mars)
** [https://x.com/karim_yourself/status/1890100168378536155 Black Sun (trailer)]
** [https://x.com/BrivaelLp/status/1890122101153231288 AI avatars] ([https://www.argil.ai/ Argil AI])
** [https://x.com/mrjonfinger/status/1890783411679236473 Adding elements to real video] ([https://x.com/mrjonfinger/status/1891337081923772918 other example])
** [https://x.com/SynthReveries/status/1892278954137940289 Glitch]
** Anime: [https://x.com/freeeebird2300/status/1889119007707689146 sci-fi] (Ray2), [https://x.com/Artedeingenio/status/1891173784188756069 sci-fi] (Ray 2), [https://x.com/seiiiiiiiiiiru/status/1890980673743474931 90s sci-fi] (Luma) and [https://x.com/TomLikesRobots/status/1891209369804591447 moody] (Midjourney and Ray2)
* February 2025: Meta [https://hila-chefer.github.io/videojam-paper.github.io/ VideoJAM]
* February 2025: ByteDance [https://omnihuman-lab.github.io/ OmniHuman-1]
* February 2025: ByteDance [https://saiyan-world.github.io/goku/ Goku] ([https://arxiv.org/abs/2502.04896 paper], [https://x.com/ai_for_success/status/1888821141495844991 examples])
* February 2025: [https://huggingface.co/stepfun-ai/stepvideo-t2v Step-Video-T2V] open-source model ([https://arxiv.org/abs/2502.10248 paper], [https://github.com/stepfun-ai/Step-Video-T2V code], [https://yuewen.cn/videos demo], [https://x.com/ai_for_success/status/1891369136082854129 examples])
* February 2025: Pika [https://x.com/pika_labs/status/1892620122818294109 Pikaswaps] (examples of [https://x.com/FreddyChavezO/status/1892678426487881805 modifying regions], [https://x.com/CharaspowerAI/status/1893216710141919637 swapping items])
* February 2025: Alibaba [https://wanai.pro/ Wan 2.1] [https://huggingface.co/blog/LLMhacker/wanai-wan21 open-source] ([https://x.com/fofrAI/status/1894862403260596371 examples])
* February 2025: [https://thetwinai.com/ Twin AI]: compose videos with provided character, object, location ([https://x.com/EHuanglu/status/1901277394729930984 example])
* February 2025: Examples:
** [https://x.com/mrjonfinger/status/1893109598627750164 Infected] (Pika swaps and additions)
** [https://x.com/amli_art/status/1893447314913796253 Hostile Government Takeover] (Veo2)
** [https://x.com/KarolineGeorges/status/1895226395812561399 Dual Mechanism] (Pikaframes 2.2)

====March 2025====
* March 2025: Examples:
** [https://x.com/SynthReveries/status/1895826068617252901 Doors] (music video)
** [https://x.com/bind_lux/status/1894492032414224792 Drum and Bass] (music video; Kling, audio from [https://www.riffusion.com/?filter=staff-picks Riffusion])
** [https://x.com/RileyRalmuto/status/1896088776151269523 Woman's face] (Sora)
** [https://x.com/ryanwpatterson/status/1896968881731948844 Skating] (Ray2)
** [https://www.threads.net/@evolving.ai/post/DGlRyRoO7c9?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Filming commercial on Mars]
** [https://www.threads.net/@evolving.ai/post/DGycqyhuETS?xmt=AQGz6T_8VppPoAqb5aPwAJ2zzRLUP-YXi8SabAT0IIEA9Q Original Source commercial] (AI and real footage)
** [https://x.com/maxescu/status/1896926229204496788 Time-lapses] (Pika 2.2)
** [https://www.youtube.com/watch?v=2RhkcJyhg0E Hallucination]
** [https://x.com/town_in_new/status/1897354572139782620 Macro video of bubbles]
* March 2025: [https://github.com/Tencent/HunyuanVideo-I2V HunyuanVideo-I2V] image-to-video
* March 2025: Google [https://x.com/labsdotgoogle/status/1897376700666626233 Whisk Animate] (based on Veo2, [https://x.com/maxescu/status/1902742535618888025 examples])
* March 2025: Examples:
** [https://x.com/jdp2oo/status/1897874927367160114 Recursion (horror)] (Kling)
** [https://x.com/blizaine/status/1897826177970028614 Will Smith Eating Spaghetti while Sitting Inside a Bag] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025])
** [https://x.com/mickmumpitz/status/1897979382687297697 Paper Jam] (Kling with custom workflows to enable precise control)
** [https://x.com/maxescu/status/1899155936645722216 Cinematic shots] (Google Whisk and Luma)
** [https://x.com/weirdai_art/status/1899631013002711409 Perfunctory Horizons]
** [https://x.com/maxescu/status/1900243840499368319 A Hard Winter]
** [https://x.com/RoyalKongz/status/1900315389139014074 Consistent character example]
** [https://x.com/maxescu/status/1900652266362650853 Anthropomorphic Animals]
** [https://x.com/kimmonismus/status/1900457543299727718 Realistic (influencer-style)]
** [https://x.com/SunoMusic/status/1900942410584043579 I Feel Cultured] (music video with surrealist vibes)
** [https://rodeo.club/post/0x30b45c56d62751D763D3B8bFe4D18c4BB65EDF2c/209 journey of utmost importance]
** [https://x.com/aiordieshow/status/1901930851127984291 Karen: Unleashed]
** [https://x.com/minchoi/status/1901783767364092232 Yarn Cat]
** [https://x.com/andyorsow/status/1901619535180091509 Ned's Wet Deli] (Runway)
** [https://www.youtube.com/watch?v=KVoiooE8C0c BOOTS], a.k.a. [https://x.com/RuairiRobinson/status/1902027217137484117 "Our enemies are cartoon monsters"] (music video based on poem by Rudyard Kipling; Veo2)
** Flying in a dream: [https://x.com/minchoi/status/1902197944826183864 1], [https://x.com/venturetwins/status/1901796679063626060 2]
** [https://x.com/jasonzada/status/1902129567659389443 Commercial for Mercedes-Benz and FYI Radio]
** [https://x.com/maxescu/status/1903108496666542562 Selfie video] (Luma)
** Podcasts: [https://www.reddit.com/r/singularity/comments/1jintit/rottenly_roasted_now_full_script_is_also_not/ Rottenly Roasted] and [https://www.reddit.com/r/aivideo/comments/1jerh56/worst_date_ever/ Worst Date Ever] [https://x.com/OriZilbershtein/status/1903503438744318002 (Imagen 3, Hedra, Elevenlabs, Topaz)]
** [https://x.com/DexploreArts/status/1903822122150986000 Ambience] (Midjourney, Luma)
** [https://x.com/TheoMediaAI/status/1904207679511572845 The Bridge] (2 minute short; Veo2)
** [https://x.com/peteromallet/status/1904268944992829462 Pulp Fiction] (Wan video editing)
** [https://x.com/madpencil_/status/1906765750624493650 Camera Controls] (Luma Ray2)
* March 2025: [https://www.hedra.com/ Hedra] [https://x.com/hedra_labs/status/1897699010632466469 Character 3]
* March 2025: [https://huggingface.co/hpcai-tech/Open-Sora-v2 Open Sora v2] ([https://github.com/hpcaitech/Open-Sora code])
* March 2025: Amazon Prime debuts [https://en.wikipedia.org/wiki/House_of_David_(TV_series) House of David], with special effects created by [https://www.thewonderproject.com/ Wonder Project] using a [https://x.com/PJaccetturo/status/1903126616831676792 combination of traditional and AI methods] (reportedly including Midjourney and Runway)
* March 2025: Examples:
** [https://x.com/PJaccetturo/status/1905151190872309907 What if Studio Ghibli directed Lord of the Rings?] (OpenAI GPT-4o in-context image generation, Kling)
** [https://x.com/ROHKI/status/1906039022662963269 RŌHKI]
** [https://x.com/iaveras/status/1906362437487534296 Why]
** [https://x.com/BrianRoemmele/status/1906476721236570508 Commercial for Puma] (research/test)
** [https://x.com/Salmaaboukarr/status/1906776503343325469 Commercial for KFC] (concept ad)
* March 2025: Runway ML [https://runwayml.com/research/introducing-runway-gen-4 Gen-4]
** [https://www.youtube.com/watch?v=c8IBmK7GZP8 The Lonely Little Flame]
** [https://www.youtube.com/watch?v=Z0P6qjMUl34&t=1s The Herd]
** [https://www.youtube.com/watch?v=9HzdNhOe09I The Retrieval]
** [https://www.youtube.com/watch?v=xEhgxhrAjE4 NYC is a Zoo]
** [https://www.youtube.com/watch?v=ENGKp5wn344 Scimmia Vede] (music video)
** More examples: [https://x.com/techhalla/status/1906807994009993473 various], [https://x.com/c_valenzuelab/status/1907958530369372541 art direction], [https://x.com/c_valenzuelab/status/1908146364741029998 mannequins], [https://x.com/c_valenzuelab/status/1907921566643732612 taxi], [https://x.com/c_valenzuelab/status/1907432109695717798 small things], [https://x.com/c_valenzuelab/status/1907563448902496362 long shot (1m)]

====April 2025====
* April 2025: Examples:
** [https://x.com/AzeAlter/status/1906974768705990794 Age of Beyond]
** [https://x.com/techhalla/status/1907790675057242319 Commercial for Coca-Cola] (Higgsfield)
** [https://www.reddit.com/r/StableDiffusion/comments/1jr6j11/comment/mle9bq5/?context=3 Anime scene (3m)] (Wan 2.1 with LoRa)
** [https://x.com/pika_labs/status/1908263310912610401 Taxes then Death] (Pika multikeyframe)
* April 2025: [https://www.krea.ai/ Krea] [https://x.com/krea_ai/status/1907829389452021853 Video Re-Style]
* April 2025: ByteDance [https://grisoon.github.io/DreamActor-M1/ DreamActor-M1] performance transfer
* April 2025: Examples:
** [https://x.com/Diesol/status/1908535493673050403 Mercs] (Midjourney v7, Ray2)
** [https://x.com/minchoi/status/1909078846126649440 Cat at theme park]
** [https://x.com/c_valenzuelab/status/1909630883218207036 Timelapse history] (Runway Gen4)
** [https://x.com/EHuanglu/status/1909660808973533225 Examples for use in advertising]
** [https://x.com/arohaAIX/status/1910688361221599361 Sci-fi scapes]
** [https://x.com/PJaccetturo/status/1910750148055146708 Avα]
** [https://x.com/imagineFERA/status/1910601934207152576 The Bureau]
** [https://x.com/jasonzada/status/1911812014059733041 Beaver and Sock (3m)]
** [https://x.com/Delachica_/status/1911842237622735052 Organic Waste (5m)] (Runway)
** [https://x.com/c_valenzuelab/status/1912260798270882104 Fly] (Runway Gen4)
* April 2025: Alibaba [https://arxiv.org/abs/2504.04842 FantasyTalking] lipsync ([https://arxiv.org/abs/2504.04842 paper], [https://x.com/EHuanglu/status/1910341110322577442 examples])
* April 2025: Tencent Hunyuan [https://arxiv.org/abs/2411.16331 Sonic] image animation/lipsync to audio ([https://x.com/ai_for_success/status/1911719866958286864 examples])
* April 2025: ByteDance [https://huggingface.co/papers/2504.08685 Seaweed-7B] ([https://arxiv.org/abs/2504.08685 preprint], [https://www.youtube.com/watch?v=OaPI6K2y3rI examples])
* April 2025: [https://app.klingai.com/global/release-notes Kling 2.0] ([https://www.youtube.com/watch?v=Yqvh3M12T_M video])
* April 2025: [https://www.skyreels.ai/home Skyworks] [https://github.com/SkyworkAI/SkyReels-V2 SkyReels V2] (open-source, unlimited extension; [https://x.com/AngryTomtweets/status/1914270477482443142 examples])
* April 2025: [https://sand.ai/ Sand AI] [https://huggingface.co/sand-ai/MAGI-1 Magi-1] (open source, unlimited extension; [https://x.com/AngryTomtweets/status/1914318743578296506 examples], [https://x.com/dreamingtulpa/status/1916035289300275372 more examples])
* April 2025: Examples:
** [https://x.com/maxescu/status/1912100029549994016 Mars 2035 (3m)] (Kling 2.0)
** [https://x.com/ai_for_success/status/1912466999147450600 Kingdom (dragon battle, 3m)]
** [https://x.com/imagineFERA/status/1913156296657756278 Reflection (3m)] (Gen4)
** [https://x.com/Wytsekoetse/status/1913547157493162035 Pizza Galaxy (1m)] (MJ and Gen4)
** [https://www.youtube.com/watch?v=rseqmSGH7xk Snoop Dogg music video: Last Dance with Mary Jane] (blend of traditional and AI effects)
** [https://x.com/dreamingtulpa/status/1915104310448501129 Realistic human motion]
** [https://x.com/KarolineGeorges/status/1915113151546396893 Inception loop] (Gen4)
** [https://x.com/rayisdoingfilm/status/1916468807435952330 Tuesday (1m)] (Gen4)
** [https://www.youtube.com/watch?v=XWdwF1q3kDw Deus in Machina Automata (4m)] (Gen4)
** [https://x.com/machina9000/status/1915090908850049223 Outsiders (3m music video)]

====May 2025====
* May 2025: [https://huggingface.co/Lightricks/LTX-Video LTX-Video 13B] ([https://github.com/Lightricks/LTX-Video code], [https://x.com/maxescu/status/1919801813987164527 examples], [https://x.com/cubiq/status/1919748210567815551 more examples])
* May 2025: HeyGen Avatar IV (examples: [https://x.com/StevieMac03/status/1919910677860216869 sci-fi], [https://x.com/KarolineGeorges/status/1919801983143211222 Come Closer], [https://x.com/maxescu/status/1920410329454100973 singing], [https://x.com/minchoi/status/1920853859171234165 various])
* May 2025: Tencent [https://hunyuancustom.github.io/ HunyuanCustom]
* May 2025: Examples:
** [https://x.com/lifeofc/status/1920331476157280413 Iris (1.5m)] (Midjourney, Luma, Runway)
** [https://runwayml.com/customers/the-making-of-mars-and-siv Mars and Siv: "No Vacancy" (episode 1, 6m)] (Runway)
** [https://x.com/cfryant/status/1921317318744760817 Go to the East Wing] (dreamlike, Luma)
** [https://x.com/DeryaTR_/status/1921015340827304389 Yu Lanter showreel] (Higgsfield)
** [https://x.com/freeeebird2300/status/1921789387614134652 Cyberpunk anime] (Luma)
** [https://x.com/LittleTinRobot/status/1921692735930589246 Alien animals] (Runway)
** [https://x.com/minchoi/status/1922500563792486878 America's Funniest AI Home Videos (3m)]
** [https://x.com/c_valenzuelab/status/1924204409833103365 Editing POV shots from AR glasses] (Runway)
* May 2025: [https://runwayml.com/gen48 Gen:48] Fourth Edition winners:
** [https://www.youtube.com/watch?v=NphCYRXjqTI&t=174s Home] (3m)
** [https://www.youtube.com/watch?v=L2DQwCp_DCw The King's Secret] (2m)
* May 2025: [https://viggle.ai/home Viggle] Live [https://x.com/ViggleAI/status/1926324953038627214 enables] real-time avatar control
* May 2025: Google [https://blog.google/technology/ai/generative-media-models-io-2025/ Veo 3] (examples: [https://x.com/babaeizadeh/status/1924942128851124284 conversation], [https://x.com/mattshumer_/status/1925039973310308424 cooking], [https://x.com/jerrod_lew/status/1924934440486371589 singing], [https://x.com/MartinNebelong/status/1924926779677905014 simple story], [https://x.com/Diesol/status/1925114473544913004 cinematic action sequence], [https://x.com/laszlogaal_/status/1925094336200573225 car show interviews], [https://x.com/arikuschnir/status/1924953349943697763 We Can Talk], [https://x.com/venturetwins/status/1925021235530105298 podcat], [https://x.com/maxescu/status/1925079990061957423 various], [https://x.com/jerrod_lew/status/1927092379892265139 camera moves])
* May 2025: Examples:
** [https://x.com/javilopen/status/1925495026903380358 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025])
** [https://x.com/MetaPuppet/status/1926659557914268155 Bob from Marketing] (Veo 3)
** [https://x.com/dustinhollywood/status/1926733069475565622 He is King (16m)]
** [https://x.com/HashemGhaili/status/1925616536791760987 Prompt Theory], [https://x.com/HashemGhaili/status/1925332319604257203 part 2], [https://x.com/HashemGhaili/status/1927467022213869975 Afterlife (3m)] (Veo3)
** [https://x.com/JoannaStern/status/1927856754873835747 My Robot and Me (3m)] (Veo, Runway)
** [https://x.com/rohanpaul_ai/status/1928152398930817238 The Internet's Over] (Veo3)
** [https://www.reddit.com/r/aivideo/comments/1l0rl7d/before_colours_fade/ Before Colours Fade (2m)] (Midjourney, Kling)

====June 2025====
* June 2025: Examples:
** [https://x.com/amasad/status/1930505292904837132 Bigfoot ASMR]
** [https://x.com/minchoi/status/1930670583605514333 Talking] (HeyGen Avatar IV upgrade)
** [https://x.com/ROHKI/status/1931081752992477285 Where are all the aliens? (2m)]
** [https://x.com/fofrAI/status/1930999540770893874 Natural talking]
** [https://x.com/ammaar/status/1931672722418851904 Elemental Showdown - Mortal Kombat (3m)]
** [https://x.com/maxjoseph/status/1932104616021565476 It Starts at the End (music video, 4m)]
** [https://x.com/deedydas/status/1932105266654581116 Sci-fi trailer (2m)]
** [https://x.com/DrMachakil/status/1931816470901575924 The Prompt Floor (2m)]
** [https://x.com/DrMachakil/status/1853960062546366856 NALVORA (2.7m)] - [https://x.com/DrMachakil/status/1932904599004066200 Best Trailer, Metamorph AI Film Awards]
** [https://x.com/Kalshi/status/1932891608388681791 Commercial for Kalshi (30s)] - [https://x.com/PJaccetturo/status/1932893260399456513 to air during NBA finals] (Veo)
** [https://x.com/ROHKI/status/1933594430113788227 Your Brain is Broken on Purpose (2m)]
** [https://x.com/c_valenzuelab/status/1934312626021949687 Runway Gen-4 Reference examples]
** [https://x.com/JesusPlazaX/status/1934253813696786661 Paper airplane]
** [https://x.com/minchoi/status/1934032730947526872 Veo3 examples]
** [https://x.com/NomadsVagabonds/status/1935329331410075734 Reset 3 (1m, surreal)]
** [https://x.com/HashemGhaili/status/1935722105322323968 It Has No Soul (1m, Veo3)]
* June 2025: [https://seedance.net/seedance Seedance 1.0] ([https://arxiv.org/abs/2506.09113 preprint])
* June 2025: [https://hailuoai.video/ Hailuo AI] (MiniMax) Hailuo 02 ([https://x.com/venturetwins/status/1934236631336403344 "Kangaroo" during testing]; examples: [https://x.com/lepadphone/status/1935078910934626429 various], [https://x.com/alexgnewmedia/status/1935018186954719365 various], [https://x.com/FussyPastor/status/1935065068456263883 tsunami], [https://x.com/thedorbrothers/status/1935098802744213935 fight scene], [https://x.com/umesh_ai/status/1935028257708966231 fox running], [https://x.com/BrentLynch/status/1934979825636446268 blogger], [https://x.com/HalimAlrasihi/status/1935297126759538735 transitions], [https://x.com/MKMXLA/status/1938318951664280045 skateboarding])
* June 2025: Midjourney video ([https://x.com/minchoi/status/1934373051464057062 early examples], [https://x.com/ciguleva/status/1935386452197785892 various], [https://x.com/juliewdesign_/status/1935395999175876696 various], [https://x.com/emollick/status/1935504703023899096 Ethan Mollick], [https://x.com/PJaccetturo/status/1935383312392151528 highly rated], [https://x.com/maxescu/status/1935674561821126847 complex environments], [https://x.com/CoffeeVectors/status/1935863623076675875 manga])
* June 2025: Examples:
** [https://x.com/StevieMac03/status/1935768436556378170 The Battle of Glenvael - Orcs vs Humans] (Hailuo)
** [https://x.com/HashemGhaili/status/1935036744568824208 The Sentence (9m, Veo3)]
** [https://x.com/elder_plinius/status/1936145834585862225 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025])
** [https://x.com/venturetwins/status/1937232461576175809 Gymnastics] (Hailuo 02)
** [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI] (Veo3)
** [https://x.com/thedorbrothers/status/1937926400507580726 Vorex (2m trailer)]
** [https://x.com/OnerBiberkoku/status/1938972810321281394 Doğrucu (3m music video, Veo3)]
* June 2025: [https://higgsfield.ai/soul Higgsfield Soul] Video Effects ([https://x.com/higgsfield_ai/status/1937931727084917097 examples], [https://x.com/HashemGhaili/status/1938278903765995611 realism])
* June 2025: Alibaba [https://omni-avatar.github.io/ OmniAvatar] ([https://arxiv.org/abs/2506.18866 paper], [https://github.com/Omni-Avatar/OmniAvatar code], [https://huggingface.co/OmniAvatar/OmniAvatar-14B model], [https://x.com/AngryTomtweets/status/1939850674776547359 examples])

====July 2025====
* July 2025: Examples:
** [https://x.com/Kavanthekid/status/1940452444850589999 Untold - The Immortal Blades Saga] (2m trailer)
** [https://x.com/minchoi/status/1941234456461029584 Unofficial commercial for Liquid Death (1m)]
** [https://x.com/brain_racked/status/1942594951310893425 A parade of the chosen theocracy on Callisto]
** [https://x.com/Popeyes/status/1943316484404433182 Popeyes commercial - diss track (1m)]
*** [https://x.com/gabemichael_ai/status/1944070622155616668 (Unofficial) Wendy's response - diss track (2m)]
*** [https://x.com/ai_massive/status/1947689537641357618 (Unofficial) In-N-Out rap battle (3m)]
** [https://x.com/Kalshi/status/1943339616716599548 Kalshi commercial]
** Jonah (25m TV show, [https://x.com/PJaccetturo/status/1946101701548880029 making of], [https://kingstonestudios.uscreen.io/programs/jonah purchase here])
** [https://x.com/Totemko/status/1946243585021452335 Unofficial commercial for Mercedes (17s)]
** [https://x.com/CoffeeVectors/status/1946016960916889632 Skateboarding music video (1m)]
* July 2025: Runway ML [https://help.runwayml.com/hc/en-us/articles/42311337895827-Creating-with-Act-Two Act-Two] (video-to-video performance transfer)
* July 2025: Examples:
** Neural Viz [https://www.youtube.com/watch?v=juDDHvHroQ8 The Cop Files: Part VI (8m)]
** [https://x.com/Kavanthekid/status/1947696716981145971 Perfect Dark - Concept Trailer (1.5m)]
** [https://x.com/StelfieTT/status/1948753090858885131 Exodus (2m trailer)]
** [https://x.com/Jett_Collective/status/1949140450553540841 A Walk Together - Life and love in motion (1m, Midjourney Video)]
* July 2025: Netflix sci-fi show [https://en.wikipedia.org/wiki/The_Eternaut_(TV_series) The Eternaut] [https://x.com/omooretweets/status/1946290797399400662 used genAI] for a particular scene (building collapse)
* July 2025: Google Veo [https://x.com/GoogleLabs/status/1948477692715700718 emergent annotation direction] ([https://x.com/venturetwins/status/1948771505783144641 example], [https://x.com/bilawalsidhu/status/1948844167603310660 example], [https://x.com/jboogx_creative/status/1949230927504371765 example], [https://x.com/Ror_Fly/status/1949606017739747625 example])
* July 2025: Runway [https://runwayml.com/research/introducing-runway-aleph Aleph] contextual editing
* July 2025: Wan 2.2 (open source, [https://x.com/Alibaba_Wan/status/1949804551655276989 examples])
====August 2025====
* August 2025: Pika [https://x.com/pika_labs/status/1954935844936024476 audio-driven performance] ([https://x.com/minchoi/status/1954989794129514937 examples], [https://x.com/pika_labs/status/1955007656302924192 examples])
* August 2025: Examples:
** [https://www.youtube.com/watch?v=gePD1Hf1qPc Eve and Adam] (8m, [https://x.com/MetaPuppet/status/1954254544935719259 multiple tools])
** [https://x.com/runwayml/status/1955615613583519917 Redesign a space] (Runway Aleph)
** [https://x.com/theGioM/status/1955656398248763428 Detroit Pretend Work Park (1m)]
** [https://x.com/pzf_ai/status/1940816374211006600 The Weight of Light] (3m music video, Midjourney & Suno)
** [https://x.com/EHuanglu/status/1956788759778967710 Commercial for Pepsi]
** [https://x.com/StelfieTT/status/1956633450326200426 Emotion]
** [https://x.com/dustinhollywood/status/1957940749862875383 TZIGANE]
** [https://x.com/0xFramer/status/1960720090921623636 Anime chase sequence] (Nano Banana and Seedance 1.0)
* August 2025: ByteDance [http://www.waver.video/ Waver 1.0]
* August 2025: [https://huggingface.co/Wan-AI/Wan2.2-S2V-14B Wan2.2-S2V 14B]

====September 2025====
* September 2025: [https://www.wsj.com/tech/ai/openai-backs-ai-made-animated-feature-film-389f70b0 OpenAI Backs AI-Made Animated Feature Film: Film, called ‘Critterz,’ aims to debut at Cannes Film Festival and will leverage startup’s AI tools and resources.]
* September 2025: Examples:
** [https://x.com/kentskooking/status/1964606423037542459 A loop to wake up to (30s)]
** [https://x.com/venturetwins/status/1966570512991350907 time lapse]
** [https://x.com/NeuralViz/status/1967391198487994652 The Adventures of Reemo Green] (11m, Neural Viz)
** [https://x.com/kellyeld/status/1967620786166079545 Surreal DJs music video (2m)]
** [https://x.com/dustinhollywood/status/1968724784440558044 Glass City] (Hailuo)
** [https://x.com/TheoMediaAI/status/1968646951227777529 Alarm] (1m, multiple tools including world synthesis for consistent environments)
* September 2025: [https://lumalabs.ai/ray Luma] [https://x.com/LumaLabsAI/status/1968684330034606372 Ray3] ([https://x.com/cfryant/status/1968692370725077251 example])
* September 2025: Examples:
** [https://x.com/mrjonfinger/status/1968687352382910469 Stop motion interpolation] (Luma Ray3)
** [https://x.com/heydin_ai/status/1969514789169959128 Skyland] (1.5m, various tools)
** [https://x.com/iamluokai/status/1970185972076925427 Dancing] (Wan 2.2)
** [https://x.com/c_valenzuelab/status/1970497214108815584 Under Armor commercial] (Runway Aleph)
** [https://x.com/FilmsBySav/status/1971247214795358706 OG PRIME] (10m, Kling)
** [https://www.youtube.com/watch?v=JGLoTjxd-Ss PLANET] (37m)
* September 2025: [https://x.com/Kling_ai/status/1970439808901362155 Kling AI 2.5 Turbo] (examples: [https://x.com/OrctonAI/status/1970472214794220008 cyberpunk], [https://x.com/ImagineArt_X/status/1970586138655236565 human motion], [https://x.com/fAIkout/status/1970505756853334324 motion and emotion], [https://x.com/fAIkout/status/1970495039248965636 painting], [https://x.com/venturetwins/status/1970563820478439546 gymnastics], [https://x.com/Art_For_Joy/status/1970249516033970434 breakdancing], [https://x.com/HaydenLeeWrites/status/1970523610734567819 combat], [https://x.com/umesh_ai/status/1970497680536150454 cinematic], [https://x.com/LillyLiCT/status/1970580585073819752 horror camerawork], [https://x.com/StevieMac03/status/1970559778804908331 extended sequence])
* September 2025: OpenAI [https://openai.com/index/sora-2/ Sora 2] ([https://x.com/minchoi/status/1973949620318580970 examples])

====October 2025====
* October 2025: Examples:
** [https://x.com/minchoi/status/1976042197154963702 Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025])
** [https://www.youtube.com/watch?v=JhH3uxcdM1M Frostbite] (3m, Sora 2)
** [https://x.com/Jukanlosreve/status/1977764418709758106 (Fake) "Behind the scenes" for a Chainsaw Man live action] ([https://x.com/PJaccetturo/status/1972705821072261402 others])
* October 2025: Google [https://blog.google/technology/ai/veo-updates-flow/ Veo 3.1]
* October 2025: Examples:
** [https://x.com/aisearchio/status/1978465562821898461 Will Smith Eating Spaghetti], Veo 3.1 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025])
** [https://x.com/Diesol/status/1978755688261128227 War footage] (Veo 3.1)
** [https://www.meta.ai/@dustin_hollywood/post/bG3BHB21W0l/yukon/ Yukon] (music video, [https://x.com/dustinhollywood/status/1982260655957700746 Dustin Hollywood])
** [https://x.com/Diesol/status/1980922041131028515 Bloom] (2m, Veo 3.1)
** [https://x.com/xmuse_/status/1982026008803905639 Auction] (1m)
** [https://x.com/kellyeld/status/1982425147496882287 Dancing] (music video; Midjourney, Suno, Veo3)
** [https://x.com/JesusPlazaX/status/1982393609069412433 Anime example] (Midjourney, Grok Imagine)
** [https://x.com/EccentrismArt/status/1982830100266783039 King Arthur] (1m)
** [https://x.com/venturetwins/status/1983024227352789162 Transitions] (1m music video)
** [https://x.com/eastflatsfilm/status/1984116704704971076 Unofficial commercial for Nike] (2m, Midjourney, Hailuo)
** [https://x.com/PJaccetturo/status/1984639281848336592 Loneliness/Halloween] ([https://www.linkedin.com/posts/simon-meyer-976339160_this-could-be-the-scariest-halloween-film-activity-7389892778144735232-6CYY?utm_source=share&utm_medium=member_desktop&rcm=ACoAAADeoqYBzX8N9-j_hRQvl1e7OUlOgFptNF0 1.5m])
** [https://www.youtube.com/watch?v=43h61QAXjpY Wave] (2m music video, [https://x.com/MIZNOM Masaki Mizuno])
* October 2025: [https://x.com/Hailuo_AI/status/1983016390878708131 Hailuo 2.3]

====November 2025====
* November 2025: Examples:
** [https://x.com/subverum/status/1985069550250107033 Valley of Shadow] (6m)
** [https://x.com/DiscussingFilm/status/1985470088074375344 Coca-cola ad] (c.f. [https://x.com/techhalla/status/1857462526859935813 2024 ad])
** [https://x.com/venturetwins/status/1985755546222542903 France 2026 Olympics ad] (blend of genAI and traditional methods, [https://x.com/venturetwins/status/1985753512362590439 behind the scenes])
** [https://x.com/NeuralViz/status/1986611025366687754 Minnesota Nice] (3m, [https://x.com/NeuralViz Neural Viz])
** [https://x.com/machina9000/status/1986563727873740934 Brutalis] (7m)
** [https://x.com/tastypxls/status/1987312755485876502?s=20 Living The Dream - Rynn] (music video, 1m)
** [https://x.com/MrDavids1/status/1988366387111170339?s=20 Environment as Character]
** [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight]
** [https://x.com/LumaLabsAI/status/1989013731267998172?s=20 Overclock] (30s, Luma)
** [https://x.com/venturetwins/status/1980685301577326994?s=20 Music video] (30s, Wan Animate)
** [https://x.com/venturetwins/status/1990227418553209259?s=20 Promotional material for Pudong Art Museum - Louvre exhibition in Shanghai] (1m)
** [https://x.com/Kyrannio/status/1990324648488186358?s=20 Loop 87 A Temporal Heist] (12m, claim that video was generated fully autonomously using AI agent NoSpoon)
** [https://x.com/AzeAlter/status/1906974768705990794?s=20 Age of Beyond] (3m)
** [https://x.com/c_valenzuelab/status/1991245088446386495?s=20 Ausencia] (5m)
** [https://x.com/AngryTomtweets/status/1993047608617517246?s=20 live paintings] ([https://www.youtube.com/channel/UCw8kc0wDm5Bh6g9iZzEWfOg bandyquantguy] on YouTube)
** [https://x.com/BrianRoemmele/status/1994625579073900804?s=20 Michelle, on a server in Iowa] (1m)
* November 2025: [https://odyssey.ml/ Odyssey] - [https://x.com/odysseyml/status/1994873514579697830?s=20 Odyssey-2]

====December 2025====
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://app.klingai.com/global/all-tools Kling] [https://app.klingai.com/global/omni/new O1] ([https://x.com/minchoi/status/1995523379957559609?s=20 examples], [https://x.com/TheoMediaAI/status/1995517613414518987?s=20 other examples]) and Kling 2.6.
* December 2025: [https://app.pixverse.ai/onboard PixVerse v5.5]
* December 2025: Examples:
** [https://x.com/EHuanglu/status/1996649596119068687?s=20 Will Smith Eating Spaghetti], Kling 2.6 (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025])
** [https://x.com/venturetwins/status/1997898095670296615?s=20 Dreamlike POV]
** [https://x.com/chatgpt21/status/1998253809307455555?s=20 McDonalds commercial]
** [https://x.com/EHuanglu/status/1998039554402750545?s=20 Skittles commercial] (Higgsfield)
** [https://x.com/Diesol/status/1997147919603077335?s=20 The Tenant] (2m, Kling 2.6)
** [https://x.com/PsyopAnime/status/1999242965659906526?s=20 Maximum Carnage] (3m)
** [https://x.com/JeffSynthesized/status/1998786836924395875?s=20 Blurred Horizon: Episode 1] (24m)
** [https://x.com/Artedeingenio/status/2001667487784460301?s=20 Anime Action] (2m)
** [https://x.com/bearlyai/status/2005055231617605748?s=20 Dollar Shave Club commercial] (1m)
** [https://x.com/EHuanglu/status/2004020543084024295?s=20 Xmas Cameos] (1.5m)
** [https://x.com/DiDi_OKK/status/1955653520407019976?s=20 Green Screen] (2m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/DiDi_OKK/status/1998227601341702639?s=20 Arrow] (7m, [https://x.com/DiDi_OKK/status/1955653520407019976 DiDi_OK])
** [https://x.com/bluehorizon_ai/status/2004045348579561503?s=20 Live Action One Punch Man | Saitama vs Genos] (2m, [https://x.com/bluehorizon_ai Blue Horizon])
** [https://x.com/keshiAIart/status/2005254907780358201?s=20 Anime Train] (6s)
** [https://x.com/venturetwins/status/2006051632837189683?s=20 Michael Catson] (13s)
* December 2025: Runway [https://runwayml.com/research/introducing-runway-gen-4.5 Gen 4.5]
* December 2025: [https://arxiv.org/abs/2512.13507 Seedance 1.5]

===2026===
====January 2026====
* January 2026: Examples:
** [https://x.com/Itspedrito/status/2007636967048228968?s=20 Somebody That I Used to Know] (1m)
** [https://x.com/hujimari/status/2008054519704461407?s=20 Cat being disruptive at night], [https://x.com/klara_sjo/status/2007864014521720963?s=20 another], [https://x.com/alphafox/status/2009732284375830687?s=20 another] (c.f. [https://x.com/justalexoki/status/1988915573707661637?s=20 Cat playing instruments at midnight])
** [https://x.com/Uncanny_Harry/status/2008881579095961934?s=20 Character test] (30s, Kling 2.6 Motion Control, [https://x.com/Uncanny_Harry Uncanny Harry AI])
** [https://www.youtube.com/watch?v=SGJC4Hnz3m0&t=2s STAR WARS: Beggar’s Canyon | A Luke Skywalker Fan Film (Between ESB & ROTJ)] (7m)
** [https://x.com/dustinhollywood/status/2009732705299104118?s=20 TZIGANE] (9m)
** [https://x.com/Framer_X/status/2011075884246061454?s=20 The Subway Spark] (Anime, 45s)
** [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ Will Smith Eating Spaghetti] (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025])
** [https://x.com/PJaccetturo/status/2013675665539596651?s=20 The AI Artist] (1.5m)
** [https://x.com/Artedeingenio/status/2013624842021417030?s=20 Sci-fi action anime] (2m)
** [https://x.com/verbalriotshow/status/2014752509240475872?s=20 Stone Hand] (fake trailer, 1m)
* January 2026: [https://x.com/nvidia/status/2008346949301235933?s=20 Runway Gen-4.5 on] [https://www.nvidia.com/en-us/data-center/technologies/rubin/?linkId=100000401190502 Nvidia Rubin] ([https://x.com/runwayml/status/2014406560445771804?s=20 examples])
* January 2026: [https://ltx.io/model/ltx-2 LTX-2] open source video model (20s, 4k, w/ audio; [https://x.com/venturetwins/status/2010878914273697956?s=20 examples])
* January 2026: Luma [https://lumalabs.ai/blog/news/ray3_14 Ray3.14] ([https://x.com/LumaLabsAI/status/2015822842575888844?s=20 examples])
* January 2026: Examples:
** [https://x.com/pressmanc/status/2015099516500758647?s=20 Runway Gen-4.5 tests] (3.5m)
** [https://x.com/EHuanglu/status/2015573517618528538?s=20 Longchamp / Horses in the city] (1m)
** [https://x.com/dustinhollywood/status/2008154825385521418?s=20 The Last Artist] (trailer, 2m)
** [https://x.com/taziku_co/status/2015739943101047111?s=20 Monet temporal structure] (3m)
** [https://x.com/runwayml/status/2016155967285543364?s=20 Grizzlies] (1.5m, Runway Gen-4.5)
** [https://www.youtube.com/@TIME/videos On This Day... 1776] ([https://www.youtube.com/watch?v=E4cLKIxt8W8 trailer])
*** [https://www.youtube.com/watch?v=sV52AUVGc6I January 1: The Flag] (3.5m)
*** [https://www.youtube.com/watch?v=3ZDnL_a0YfQ January 10: Common Sense] (4.5m)
*** [https://www.youtube.com/watch?v=J5b1TiyKTus January 26: The Guns of Ticonderoga] (4m)

====February 2026====
* February 2026: [https://app.klingai.com/global/quickstart/klingai-video-3-omni-model-user-guide Kling 3.0]
* February 2026: [https://seedance2.ai/ Seedance 2.0] ([https://x.com/EHuanglu/status/2020131622675202512?s=20 example 1], [https://x.com/EHuanglu/status/2020492770872566053?s=20 2], [https://x.com/dynamicwangs/status/2020054894741451123?s=20 3], [https://x.com/patrickassale/status/2020180495900848470?s=20 4], [https://x.com/janekm/status/2020888750285332526?s=20 5], [https://x.com/Dork_sense/status/2020179955511116082?s=20 6], [https://x.com/EHuanglu/status/2020388244802740728?s=20 7], [https://x.com/zhao_dashuai/status/2020528048341217592?s=20 8], [https://x.com/AngryTomtweets/status/2020784886932738470?s=20 9], [https://x.com/javilopen/status/2020558352590287298?s=20 10], [https://x.com/linxiaobei888/status/2021399630672691710?s=20 11])
* February 2026: Examples:
** [https://x.com/PJaccetturo/status/2019072637192843463?s=20 Unofficial opening sequence for The Way of Kings by Brandon Sanderson] (1.5m, Kling 3)
** [https://x.com/dailycatsclips/status/2020117502915989680?s=20 Cat Dreams] (1.5m)
** [https://x.com/DotCSV/status/2021269435567218725?s=20 Will Smith Eating Spaghetti] (Seedance 2.0) (c.f. [https://www.youtube.com/watch?v=XQr4Xklqzw8 April 2023], [https://x.com/kimmonismus/status/1873568693357294014 December 2024], [https://x.com/_deepfates/status/1875215969452523785 January 2025], [https://x.com/blizaine/status/1897826177970028614 March 2025], [https://x.com/javilopen/status/1925495026903380358 May 2025], [https://x.com/elder_plinius/status/1936145834585862225 June 2025], [https://x.com/minchoi/status/1976042197154963702 October 2025], [https://x.com/aisearchio/status/1978465562821898461 October 2025], [https://x.com/EHuanglu/status/1996649596119068687?s=20 December 2025], [https://www.reddit.com/r/aivideo/comments/1qi8zuv/25_years_difference_makes_you_wonder_where_ai/ January 2026], [https://x.com/SpecialSitsNews/status/2020583709741883666?s=20 progression to 2026])
** [https://x.com/thedorbrothers/status/2023460644905742577?s=20 To Be Continued] (3m, [https://x.com/thedorbrothers The Dor Brothers])
** [https://x.com/ivanka_humeniuk/status/2023711181978919034?s=20 Crow - Game of Thrones] (1m)
** [https://x.com/billyrestey/status/2024193251763507528?s=20 Reboot] (2m)
** [https://x.com/kenw_2/status/2024625510534283508?s=20 Late for work] (1.5m, MJ NBP Seedance 2.0)
** [https://x.com/heydin_ai/status/2024616890338079181?s=20 AI Man] (4.5m, MJ NBP Seedance 2.0)
** [https://x.com/maxescu/status/2024882372836250033?s=20 But AI Will Never Be Able To Do This] (3m, Seedance 2.0)
** [https://x.com/DiDi_OKK/status/2018784243753599093?s=20 Sign] (8m)
** [https://x.com/LTXStudio/status/2025994426309640291?s=20 Commercial for Nexus] (1m)
** [https://x.com/maxescu/status/2026007558159278477?s=20 Showcase] (9m, [https://x.com/maxescu Alex Patrascu])
** [https://x.com/EHuanglu/status/2025410944512192536?s=20 Painterly] (30s, [https://x.com/EHuanglu el.cine])
** [https://x.com/kellyeld/status/2025975677657440267?s=20 Imposter Syndrone] (2m, music video)
** [https://www.youtube.com/watch?v=nKnE2Wn1VNQ All Is Conscious] (3.5m)
** [https://x.com/CuriousRefuge/status/2026086576191934769?s=20 Emotional argument] (3m, Seedance 2.0)
** [https://x.com/jdkanani/status/2023781028368884031?s=20 Moonlight Veil] (10m)

====March 2026====
* March 2026: Examples:
** [https://x.com/jacopo_reale/status/2029909372764041559 Looking for Bianca] (6m, Kling 3.0)
** [https://x.com/sumiturkude007/status/2030933543443193908?s=20 Gardener] (3m, Seedance 2.0)
** Micro-movie (Chinese): [https://x.com/yyyole/status/2029225419669684418?s=20 episode 1], [https://x.com/yyyole/status/2030850450464112675?s=20 episode 2]
** Live-action Evangelion: [https://x.com/NACHOS2D_/status/2032401289653461052?s=20 part 1] (4.5m), [https://x.com/NACHOS2D_/status/2032778868361203770?s=20 part 2] (3.5m), [https://x.com/NACHOS2D_/status/2033126071151837491?s=20 part 3] (2.5m)
** [https://x.com/lexx_aura/status/2033589846216741293?s=20 to love Wu Yong] (5m)
* March 2026: [https://higgsfield.ai/original-series Higgsfield Original Series]

AI safety

2026-03-19T17:55:58Z

KevinYager: /* Description of Safety Concerns */

=Learning Resources=
==Light==
* [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
** [https://interactive.keepthefuturehuman.ai/ Interactive Explainer]
** [https://keepthefuturehuman.ai/essay/ Essay: Keep the Future Human]
** [https://www.youtube.com/watch?v=27KDl2uPiL8 We Can’t Stop AI – Here’s What To Do Instead] (4m video, 2025)
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late] (15m video, 2025)
* Tristan Harris TED talk (15m): [https://www.ted.com/talks/tristan_harris_why_ai_is_our_ultimate_test_and_greatest_invitation Why AI is our ultimate test and greatest invitation]
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
* [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
* 2024-10: [https://www.youtube.com/watch?v=xfMQ7hzyFW4 Writing Doom]: short film on Superintelligence (27m video)
* 2026-03: [https://www.youtube.com/watch?v=Nl7-bRFSZBs The AI book that's freaking out national security advisors] (44m video)

==Deep==
* [https://www.thecompendium.ai/ The Compendium: Humanity risks extinction from its very creations — AIs.] (2024)
* [https://www.aisafetybook.com/ Introduction to AI Safety, Ethics, and Society] (Dan Hendrycks, [https://www.safe.ai/ Center for AI Safety])
* [https://aisafety.info/ AI Safety FAQ]
* [https://deepmindsafetyresearch.medium.com/introducing-our-short-course-on-agi-safety-1072adb7912c DeepMind short course on AGI safety]

=Description of Safety Concerns=
==Key Concepts==
* [https://en.wikipedia.org/wiki/Instrumental_convergence Instrumental Convergence]
* [https://www.lesswrong.com/w/orthogonality-thesis Orthogonality Thesis]
* [https://www.alignmentforum.org/posts/SzecSPYxqRa5GCaSF/clarifying-inner-alignment-terminology Inner/outer alignment]
* [https://www.alignmentforum.org/w/mesa-optimization Mesa-optimization]
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
* 80,000 hours:
** [https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/ Risks from power-seeking AI systems]
** [https://80000hours.org/problem-profiles/gradual-disempowerment/ Gradual disempowerment]
** [https://80000hours.org/problem-profiles/catastrophic-ai-misuse/ Catastrophic AI misuse]

==Medium-term Risks==
* 2023-04: [https://www.youtube.com/watch?v=xoVJKj8lcNQ A.I. Dilemma – Tristan Harris and Aza Raskin” (video)] ([https://assets-global.website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript]): raises concern about human ability to handle these transformations
* 2023-04: [https://www.youtube.com/watch?v=KCSsKV5F4xc Daniel Schmachtenberger and Liv Boeree (video)]: AI could accelerate perverse social dynamics
* 2023-10: [https://arxiv.org/pdf/2310.11986 Sociotechnical Safety Evaluation of Generative AI Systems] (Google DeepMind)
* 2024-02: [https://yoshuabengio.org/2024/02/26/towards-a-cautious-scientist-ai-with-convergent-safety-bounds/ Towards a Cautious Scientist AI with Convergent Safety Bounds] (Yoshua Bengio)
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-06: [https://arxiv.org/abs/2506.20702 The Singapore Consensus on Global AI Safety Research Priorities]
* 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
* 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
* 2026-02: [https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk Updated thoughts on AI risk: Things have gotten scarier since 2023] ([https://x.com/Noahpinion Noah Smith])

==Long-term (x-risk)==
* 2015-02: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-1 Machine intelligence, part 1]
* 2019-03: Daniel Kokotajlo and Wei Dai: [https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?]
* 2022-06: Eliezer Yudkowsky: [https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities List AGI Ruin: A List of Lethalities]
* 2024-11: Marcus Arvan: [https://link.springer.com/article/10.1007/s00146-024-02113-9 ‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for]
* 2025-04: [https://michaelnotebook.com/xriskbrief/index.html ASI existential risk: reconsidering alignment as a goal]
* 2025-12: Philip Trammell and Leopold Aschenbrenner: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth]

=Status=
* 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
* [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)

==Assessmment==
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
* 2025-07: [https://arxiv.org/abs/2507.16534 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report]

==Policy==
* 2024-07: [https://arxiv.org/abs/2407.05694 On the Limitations of Compute Thresholds as a Governance Strategy] Sara Hooker
* 2024-07: [https://www.cigionline.org/static/documents/AI-challenges.pdf Framework Convention on Global AI Challenges] ([https://www.cigionline.org/ CIGI])
* 2024-08: NIST guidelines: [https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf Managing Misuse Risk for Dual-Use Foundation Models]

==Proposals==
* 2025-02: [https://arxiv.org/abs/2502.18359 Responsible AI Agents]
* 2025-03: [https://controlai.com/ Control AI] [https://controlai.com/dip The Direct Institutional Plan]
* 2025-04: Google DeepMind: [https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/ Taking a responsible path to AGI]
** Paper: [https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdf An Approach to Technical AGI Safety and Security]

=Research=
* 2008: [https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf The Basic AI Drives]
* 2022-09: [https://arxiv.org/abs/2209.00626v1 The alignment problem from a deep learning perspective]
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision]
* 2023-02: [https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences]
* 2023-04: [https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark]
* 2023-05: [https://arxiv.org/abs/2305.15324 Model evaluation for extreme risks] (DeepMind)
* 2023-05: [https://arxiv.org/abs/2305.03047 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision]
* 2023-06: [https://arxiv.org/abs/2306.17492 Preference Ranking Optimization for Human Alignment]
* 2023-08: [https://arxiv.org/abs/2308.06259 Self-Alignment with Instruction Backtranslation]
* 2023-11: [https://arxiv.org/abs/2311.08702 Debate Helps Supervise Unreliable Experts]
* 2023-12: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision] (OpenAI, [https://openai.com/research/weak-to-strong-generalization blog])
* 2023-12: [https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf Practices for Governing Agentic AI Systems] (OpenAI, [https://openai.com/index/practices-for-governing-agentic-ai-systems/ blog])
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist through Safety Training] (Anthropic)
* 2024-04: [https://arxiv.org/abs/2404.13208 The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions] (OpenAI)
* 2024-07: [https://arxiv.org/abs/2407.04622 On scalable oversight with weak LLMs judging strong LLMs]
* 2024-07: [https://arxiv.org/abs/2407.21792 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?] (Dan Hendrycks et al.)
* 2024-08: [https://arxiv.org/abs/2408.00761 Tamper-Resistant Safeguards for Open-Weight LLMs] ([https://www.tamper-resistant-safeguards.com/ project], [https://github.com/rishub-tamirisa/tamper-resistance/ code])
* 2024-08: [https://arxiv.org/abs/2408.04614 Better Alignment with Instruction Back-and-Forth Translation]
* 2024-10: [https://cdn.openai.com/papers/first-person-fairness-in-chatbots.pdf First-Person Fairness in Chatbots] (OpenAI, [https://openai.com/index/evaluating-fairness-in-chatgpt/ blog])
* 2024-10: [https://assets.anthropic.com/m/377027d5b36ac1eb/original/Sabotage-Evaluations-for-Frontier-Models.pdf Sabotage evaluations for frontier models] (Anthropic, [https://www.anthropic.com/research/sabotage-evaluations blog])
* 2024-12: [https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf Alignment Faking in Large Language Models] (Anthropic)
* 2024-12: [https://arxiv.org/abs/2412.03556 Best-of-N Jailbreaking] ([https://github.com/jplhughes/bon-jailbreaking code])
* 2024-12: [https://arxiv.org/abs/2412.16325 Towards Safe and Honest AI Agents with Neural Self-Other Overlap]
** 2024-07: [https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment Self-Other Overlap: A Neglected Approach to AI Alignment]
** 2025-03: [https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine Reducing LLM deception at scale with self-other overlap fine-tuning]
* 2024-12: [https://arxiv.org/abs/2412.16339 Deliberative Alignment: Reasoning Enables Safer Language Models] (OpenAI)
* 2025-01: [https://cdn.openai.com/papers/trading-inference-time-compute-for-adversarial-robustness-20250121_1.pdf Trading Inference-Time Compute for Adversarial Robustness] (OpenAI, [https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/ blog])
* 2025-01: [https://arxiv.org/abs/2501.18837 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming] (Anthropic, [https://www.anthropic.com/research/constitutional-classifiers blog],
* 2025-02: [https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs] ([https://www.emergent-values.ai/ site], [https://github.com/centerforaisafety/emergent-values github])
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
* 2025-02: [https://arxiv.org/abs/2502.14143 Multi-Agent Risks from Advanced AI]
* 2025-03: [https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
* 2025-03: [https://assets.anthropic.com/m/317564659027fb33/original/Auditing-Language-Models-for-Hidden-Objectives.pdf Auditing language models for hidden objectives] (Anthropic, [https://www.anthropic.com/research/auditing-hidden-objectives blog])
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
* 2025-07: [https://arxiv.org/abs/2507.11473 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety]
* 2025-09: [https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/ Detecting and reducing scheming in AI models]
* 2025-11: [https://assets.anthropic.com/m/74342f2c96095771/original/Natural-emergent-misalignment-from-reward-hacking-paper.pdf Natural Emergent Misalignment from Reward Hacking in Production RL] (Anthropic, [https://www.anthropic.com/research/emergent-misalignment-reward-hacking blog])
* 2025-12: [https://arxiv.org/abs/2512.16856 Distributional AGI Safety]
* 2025-12: [https://arxiv.org/abs/2511.22662 Difficulties with Evaluating a Deception Detector for AIs]
* 2025-12: [https://cdn.openai.com/pdf/d57827c6-10bc-47fe-91aa-0fde55bd3901/monitoring-monitorability.pdf Monitoring Monitorability] (OpenAI)
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
* 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]

==Demonstrations of Negative Use Capabilities==
* 2024-12: [https://arxiv.org/abs/2412.00586 Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects]
* 2025-04: [https://www.nathanlabenz.com/ Nathan Labenz] ([https://www.cognitiverevolution.ai/ The Cognitive Revolution]): [https://docs.google.com/presentation/d/1mvkpg1mtAvGzTiiwYPc6bKOGsQXDIwMb-ytQECb3i7I/edit#slide=id.g252d9e67d86_0_16 AI Bad Behavior]

==Threat Vectors==
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]
* 2025-10: [https://arxiv.org/abs/2510.07192 Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples]

=See Also=
* [[AI predictions]]

AI understanding

2026-03-18T17:37:50Z

KevinYager: /* Psychology */

=Interpretability=
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]

==Concepts==
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]

==Mechanistic Interpretability==
* 2020-03: OpenAI: [https://distill.pub/2020/circuits/zoom-in/ Zoom In: An Introduction to Circuits]
* 2021-12: Anthropic: [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* 2022-09: [https://arxiv.org/abs/2211.00593 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-07: Anthropic: [https://transformer-circuits.pub/2024/july-update/index.html Circuits Update]
* 2025-01: [https://arxiv.org/abs/2501.14926 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition] ([https://www.alignmentforum.org/posts/EPefYWjuHNcNH4C7E/attribution-based-parameter-decomposition blog post])
* 2025-01: Review: [https://arxiv.org/abs/2501.16496 Open Problems in Mechanistic Interpretability]
* 2025-03: Anthropic: [https://www.anthropic.com/research/tracing-thoughts-language-model Tracing the thoughts of a large language model]
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
* 2026-01: [https://arxiv.org/abs/2601.13548 Patterning: The Dual of Interpretability]

==Semanticity==
* 2023-09: [https://arxiv.org/abs/2309.08600 Sparse Autoencoders Find Highly Interpretable Features in Language Models]
* Anthropic monosemanticity interpretation of LLM features:
** 2023-10: [https://transformer-circuits.pub/2023/monosemantic-features/index.html Towards Monosemanticity: Decomposing Language Models With Dictionary Learning]
** 2024-05: [https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet]
* 2024-06: OpenaAI: [https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders]
* 2024-08: [https://www.alignmentforum.org/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes Showing SAE Latents Are Not Atomic Using Meta-SAEs] ([https://metasae.streamlit.app/?page=Feature+Explorer&feature=11329 demo])
* 2024-10: [https://arxiv.org/abs/2410.08201 Efficient Dictionary Learning with Switch Sparse Autoencoders] ([https://github.com/amudide/switch_sae code]) More efficient SAE generation
* 2024-10: [https://arxiv.org/abs/2410.14670 Decomposing The Dark Matter of Sparse Autoencoders] ([https://github.com/JoshEngels/SAE-Dark-Matter code]) Shows that SAE errors are predictable
* 2024-10: [https://arxiv.org/abs/2410.13928 Automatically Interpreting Millions of Features in Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.21331 Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness]
* 2024-12: [https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for Transformers]
* 2024-12: [https://www.lesswrong.com/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders Matryoshka Sparse Autoencoders]
* 2024-12: [https://www.alignmentforum.org/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes Learning Multi-Level Features with Matryoshka SAEs]
* 2025-01: [https://arxiv.org/abs/2501.19406 Low-Rank Adapting Models for Sparse Autoencoders]
* 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]
* 2025-03: [https://arxiv.org/abs/2503.00177 Steering Large Language Model Activations in Sparse Spaces]
* 2025-03: [https://arxiv.org/abs/2503.01776 Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation]
* 2025-03: [https://arxiv.org/abs/2503.01824 From superposition to sparse codes: interpretable representations in neural networks]
* 2025-03: [https://arxiv.org/abs/2503.18878 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders]
* 2025-05: [https://arxiv.org/abs/2505.20063 SAEs Are Good for Steering -- If You Select the Right Features]
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]

===Counter-Results===
* 2020-10: [https://arxiv.org/abs/2010.12016 Towards falsifiable interpretability research]
* 2025-01: [https://arxiv.org/abs/2501.16615 Sparse Autoencoders Trained on the Same Data Learn Different Features]
* 2025-01: [https://arxiv.org/abs/2501.17148 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.17727 Sparse Autoencoders Can Interpret Randomly Initialized Transformers]
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]

==Meta-cognition==
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]

==Coding Models==
* '''Sparse Auto Encoders''': See Semanticity.
* [https://github.com/saprmarks/dictionary_learning dictionary_learning]
* [https://transformer-circuits.pub/2024/jan-update/index.html#predict-future Predicting Future Activations]
* 2024-06: [https://arxiv.org/abs/2406.11944 Transcoders Find Interpretable LLM Feature Circuits]
* 2024-10: [https://transformer-circuits.pub/2024/crosscoders/index.html Sparse Crosscoders for Cross-Layer Features and Model Diffing]

==Reward Functions==
* 2024-10: [https://arxiv.org/abs/2410.12491 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL]

==Symbolic and Notation==
* [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* [https://www.arxiv.org/abs/2407.09468 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures]
* 2024-07: [https://arxiv.org/abs/2407.02423 On the Anatomy of Attention]: Introduces category-theoretic diagrammatic formalism for DL architectures
* 2024-11: [https://x.com/vtabbott_/status/1860268276569506250 diagrams to represent algorithms]
* 2024-12: [https://arxiv.org/abs/2412.03317 FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness]

==Mathematical==
* 2024-06: [https://arxiv.org/abs/2406.13762 Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis]

==Geometric==
* 2023-11: [https://arxiv.org/abs/2311.03658 The Linear Representation Hypothesis and the Geometry of Large Language Models]
* 2024-06: [https://arxiv.org/abs/2406.01506 The Geometry of Categorical and Hierarchical Concepts in Large Language Models]
** Natural hierarchies of concepts---which occur throughout natural language and especially in scientific ontologies---are represented in the model's internal vectorial space as polytopes that can be decomposed into simplexes of mutually-exclusive categories.
* 2024-07: [https://arxiv.org/abs/2407.02678 Reasoning in Large Language Models: A Geometric Perspective]
* 2024-09: [https://arxiv.org/abs/2409.17592 Deep Manifold Part 1: Anatomy of Neural Network Manifold]
* 2024-10: [https://arxiv.org/abs/2410.19750 The Geometry of Concepts: Sparse Autoencoder Feature Structure]
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

==Topography==
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]
* 2026-02: [https://arxiv.org/abs/2601.06002 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning]

==Challenges==
* 2023-07Jul: [https://arxiv.org/abs/2307.13702 Measuring Faithfulness in Chain-of-Thought Reasoning] [https://x.com/davidad/status/1839641113432305790 roughly] proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

[[Image:GYe31yXXQAABwaZ.jpeg|300px]]

=Heuristic Understanding=
* 2022-09: Janus: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators]

==Emergent Internal Model Building==
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]

===Semantic Directions===
Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)
* [https://arxiv.org/abs/1301.3781 Efficient Estimation of Word Representations in Vector Space]
* [https://aclanthology.org/N13-1090/ Linguistic Regularities in Continuous Space Word Representations]
* [https://aclanthology.org/C16-1332 Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen]
* [https://aclanthology.org/D14-1162/ Glove: Global vectors for word representation]
* [https://doi.org/10.1109/BigData.2015.7364114 Using Word2Vec to process big text data]
* [https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets] (true/false)
* [https://arxiv.org/abs/2403.10381 Monotonic Representation of Numeric Properties in Language Models] (numeric directions)
Task vectors:
* [https://arxiv.org/abs/2310.15213 Function Vectors in Large Language Models]
* [https://arxiv.org/abs/2310.15916 In-context learning creates task vectors]
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
Reasoning:
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]

===Feature Geometry Reproduces Problem-space===
* [https://arxiv.org/abs/2210.13382 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task] (Othello)
* [https://arxiv.org/abs/2309.00941 Emergent linear representations in world models of self-supervised sequence models] (Othello)
* [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* [https://doi.org/10.1038/s41562-023-01659-w Emergent analogical reasoning in large language models]
* [https://arxiv.org/abs/2310.02207 Language Models Represent Space and Time] (Maps of world, US)
* [https://arxiv.org/abs/2405.14860 Not All Language Model Features Are Linear] (Days of week form ring, etc.)
* [https://arxiv.org/abs/2406.03689 Evaluating the World Model Implicit in a Generative Model] (Map of Manhattan)
* [https://iopscience.iop.org/article/10.1088/1748-9326/ad2891 Reliable precipitation nowcasting using probabilistic diffusion models]. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
* [https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis]: Different models (including across modalities) are converging to a consistent world model.
* [https://arxiv.org/abs/2501.00070 ICLR: In-Context Learning of Representations]
* [https://arxiv.org/abs/2502.00873 Language Models Use Trigonometry to Do Addition]: Numbers arranged in helix to enable addition
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]

===Capturing Physics===
* 2020-09: [https://arxiv.org/abs/2009.08292 Learning to Identify Physical Parameters from Video Using Differentiable Physics]
* 2022-07: [https://arxiv.org/abs/2207.00419 Self-Supervised Learning for Videos: A Survey]
* 2025-02: Fair at Meta: [https://arxiv.org/abs/2502.11831 Intuitive physics understanding emerges from self-supervised pretraining on natural videos]

===Theory of Mind===
* [https://arxiv.org/abs/2302.02083 Evaluating Large Language Models in Theory of Mind Tasks]
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

===Skeptical===
* 2025-01: [https://www.arxiv.org/abs/2501.09038 Do generative video models learn physical principles from watching videos?] ([https://physics-iq.github.io/ project], [https://github.com/google-deepmind/physics-IQ-benchmark code])
* 2025-06: [https://machinelearning.apple.com/research/illusion-of-thinking The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-06: [https://arxiv.org/abs/2506.21521 Potemkin Understanding in Large Language Models]
* 2025-06: [https://arxiv.org/abs/2506.21876 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation]

==Information Processing==
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
* 2024-07: [https://arxiv.org/abs/2407.20311 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process]
** Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
** When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]

===Generalization===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]

===Grokking===
* 2022-01: [https://arxiv.org/abs/2201.02177 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]
* 2022-05: [https://arxiv.org/abs/2205.10343 Towards Understanding Grokking: An Effective Theory of Representation Learning]
* 2024-01: [https://arxiv.org/abs/2401.10463 Critical Data Size of Language Models from a Grokking Perspective]
* 2024-02: [https://arxiv.org/abs/2402.15175 Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition]
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

===Tests of Resilience to Dropouts/etc.===
* 2024-02: [https://arxiv.org/abs/2402.15390 Explorations of Self-Repair in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.15786 What Matters in Transformers? Not All Attention is Needed]
** Removing entire transformer blocks leads to significant performance degradation
** Removing MLP layers results in significant performance degradation
** Removing attention layers causes almost no performance degradation
** E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
* 2024-06: [https://arxiv.org/abs/2406.19384 The Remarkable Robustness of LLMs: Stages of Inference?]
** They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
** They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
*** '''Detokenization:''' Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
*** '''Feature engineering:''' Features are progressively refined. Factual knowledge is leveraged.
*** '''Prediction ensembling:''' Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
*** '''Residual sharpening:''' The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
** This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

==Semantic Vectors==
* 2024-06: [https://arxiv.org/abs/2406.11717 Refusal in Language Models Is Mediated by a Single Direction]
* 2025-02: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs] ([https://x.com/OwainEvans_UK/status/1894436637054214509 demonstrates] [https://x.com/ESYudkowsky/status/1894453376215388644 entangling] of concepts into a single preference vector)
* 2025-03: [https://arxiv.org/abs/2503.03666 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction]

==Other==
* 2024-11: [https://arxiv.org/abs/2411.00247 Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond]
* 2024-11: [https://arxiv.org/abs/2411.04282 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding] ([https://github.com/SalesforceAIResearch/LaTRO code])
* 2024-11: [https://arxiv.org/abs/2411.12580 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models]: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
* 2024-11: [https://arxiv.org/abs/2411.15862 LLMs Do Not Think Step-by-step In Implicit Reasoning]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]

===Scaling Laws===
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
* 2020-01: [https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models] (OpenAI)
* 2020-10: [https://arxiv.org/abs/2010.14701 Scaling Laws for Autoregressive Generative Modeling] (OpenAI)
* 2020-05: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] (Gwern)
* 2021-08: [https://arxiv.org/abs/2108.07686 Scaling Laws for Deep Learning]
* 2021-02: [https://arxiv.org/abs/2102.06701 Explaining Neural Scaling Laws] (Google DeepMind)
* 2022-03: [https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models] (Chinchilla, Google DeepMind)
* 2025-03: [https://arxiv.org/abs/2503.04715 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining]
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]

=Information Processing/Storage=
* 2020-02: [https://arxiv.org/abs/2002.10689 A Theory of Usable Information Under Computational Constraints]
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
** 2024-04: [https://arxiv.org/abs/2404.08819 The Illusion of State in State-Space Models] (figure 3)
** 2024-08: [https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size] (table 9)
* 2024-09: [https://arxiv.org/abs/2409.10482 Schrodinger's Memory: Large Language Models]
* 2024-10: [https://arxiv.org/abs/2407.01687 Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning]. CoT involves both memorization and (probabilitic) reasoning
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
* 2025-12: [https://arxiv.org/abs/2512.22471 The Bayesian Geometry of Transformer Attention]
* 2026-01: [https://arxiv.org/abs/2601.03220 From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence]

==Statistics/Math==
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]

==Tokenization==
===For numbers/math===
* 2024-02: [https://arxiv.org/abs/2402.14903 Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs]: L2R vs. R2L yields different performance on math

==Data Storage==
* 1988-09: [https://www.sciencedirect.com/science/article/pii/0885064X88900209 On the capabilities of multilayer perceptrons]
* 2006-12: [https://ieeexplore.ieee.org/document/4038449 Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition] (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N2 bits w/ N2 params)
* 2016-11: [https://arxiv.org/abs/1611.09913 Capacity and Trainability in Recurrent Neural Networks] (5 bits/param)
* 2018-02: [https://arxiv.org/abs/1802.08232 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks]
* 2019-05: [https://ieeexplore.ieee.org/document/8682462 Memorization Capacity of Deep Neural Networks under Parameter Quantization]
* 2020-02: [https://arxiv.org/abs/2002.08910 How Much Knowledge Can You Pack Into the Parameters of a Language Model?]
* 2020-08: [https://arxiv.org/abs/2008.09036 Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries] (capacity scales linearly with parameters; more training samples leads to less memorization)
* 2020-12: [https://arxiv.org/abs/2012.06421 When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?]
* 2024-04: [https://arxiv.org/abs/2404.05405 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws] (2 bits/param)
* 2024-06: [https://arxiv.org/abs/2406.15720 Scaling Laws for Fact Memorization of Large Language Models] (1T params needed to memorize Wikipedia)
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-05: [https://arxiv.org/abs/2505.24832 How much do language models memorize?] (3.6 bits/parameter)
* 2025-06: [https://arxiv.org/abs/2506.01855 Trade-offs in Data Memorization via Strong Data Processing Inequalities]

===Reverse-Engineering Training Data===
* 2025-06: [https://arxiv.org/abs/2506.10364 Can We Infer Confidential Properties of Training Data from LLMs?]
* 2025-06: [https://arxiv.org/abs/2506.15553 Approximating Language Model Training Data from Weights]

===Compression===
* 2022-12: [https://arxiv.org/abs/2212.09410 Less is More: Parameter-Free Text Classification with Gzip]
* 2023-06: [https://arxiv.org/abs/2306.04050 LLMZip: Lossless Text Compression using Large Language Models]
* 2023-07: [https://aclanthology.org/2023.findings-acl.426/ “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors]
* 2023-09: [https://arxiv.org/abs/2309.10668 Language Modeling Is Compression]
* 2024-06: [https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation]

==Learning/Training==
* 2018-03: [https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks]: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
* 2024-12: [https://arxiv.org/abs/2412.11521 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory]
* 2025-01: [https://arxiv.org/abs/2501.12391 Physics of Skill Learning]
* 2025-05: [https://arxiv.org/abs/2505.24864 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models]

===Cross-modal knowledge transfer===
* 2022-03: [https://arxiv.org/abs/2203.07519 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer]
* 2023-05: [https://arxiv.org/abs/2305.07358 Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

==Hidden State==
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
===Convergent Representation===
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
* 2025-12: [https://arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]

==Function Approximation==
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]: can learn linear functions (equivalent to least-squares estimator)
* 2022-11: [https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning]: Simple arithmetic
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models] ([https://github.com/ekinakyurek/google-research/tree/master/incontext code]): can learn linear regression
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2023-06: [https://arxiv.org/abs/2306.00297 Transformers learn to implement preconditioned gradient descent for in-context learning]
* 2023-07: [https://arxiv.org/abs/2307.03576 One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention]
* 2024-04: [https://arxiv.org/abs/2404.02893 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline]
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]

=Physics Based=
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]

=Failure Modes=
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?]: Poor causal inference
* 2023-09: [https://arxiv.org/abs/2309.12288 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"]
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)

==Adversarial==
* 2026-03: [https://arxiv.org/abs/2603.03507 Solving adversarial examples requires solving exponential misalignment]

==Fracture Representation==
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])

==Jagged Frontier==
* 2023-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality]
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]

===See also===
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]

===Conversely (AI models converge)===
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
* 2025-12: [https://arxiv.org/abs/2512.05117 The Universal Weight Subspace Hypothesis]
* 2026-01: [https://avikrishna.substack.com/p/eliciting-frontier-model-character Eliciting Frontier Model Character Training: A study of personality convergence across language models]

==Model Collapse==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
* 2023-11: [https://arxiv.org/abs/2311.12202 Nepotistically Trained Generative-AI Models Collapse]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2026-01: [https://arxiv.org/abs/2601.05280 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis]

===Analysis===
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]

===Mitigation===
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]
* 2025-03: [https://arxiv.org/abs/2503.08117 Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models]

=Psychology=
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
* 2026-01: [https://arxiv.org/abs/2601.06047 "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs]
* 2026-02: [https://arxiv.org/abs/2602.02606 Gender Dynamics and Homophily in a Social Network of LLM Agents]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]

==Persona Simulator Theory==
* 2022-09: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators] ([https://www.lesswrong.com/users/janus-1?from=post_header janus])
* 2022-12: [https://aclanthology.org/2022.findings-emnlp.423/ Language Models as Agent Models]
* 2023-02: [https://arxiv.org/abs/2302.00805 Conditioning Predictive Models: Risks and Strategies]
* 2024-09: [https://www.lesswrong.com/s/qhdHbCJ3PYesL9dde Intuitive Self-Models]
* 2026-02: [https://alignment.anthropic.com/2026/psm/ The Persona Selection Model: Why AI Assistants might Behave like Humans] (Anthropic, [https://www.anthropic.com/research/persona-selection-model blog])

==Allow LLM to think==
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]

===In-context Learning===
* 2021-10: [https://arxiv.org/abs/2110.15943 MetaICL: Learning to Learn In Context]
* 2022-02: [https://arxiv.org/abs/2202.12837 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]

==Reasoning (CoT, etc.)==
* 2025-01: [https://arxiv.org/abs/2501.18009 Large Language Models Think Too Fast To Explore Effectively]
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]

===Pathfinding===
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]

===Skeptical===
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]

==Self-Awareness and Self-Recognition and Introspection==
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
* 2025-12: [https://www.arxiv.org/abs/2512.24661 Do Large Language Models Know What They Are Capable Of?]

==LLM personalities==
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
* 2026-01: [https://arxiv.org/abs/2601.10387 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models]

==Quirks & Biases==
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]

=Vision Models=
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])

=See Also=
* [[AI]]
* [[AI tools]]
* [[AI agents]]
* [[Robots]]

AI safety

2026-03-18T17:37:10Z

KevinYager: /* Research */

=Learning Resources=
==Light==
* [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
** [https://interactive.keepthefuturehuman.ai/ Interactive Explainer]
** [https://keepthefuturehuman.ai/essay/ Essay: Keep the Future Human]
** [https://www.youtube.com/watch?v=27KDl2uPiL8 We Can’t Stop AI – Here’s What To Do Instead] (4m video, 2025)
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late] (15m video, 2025)
* Tristan Harris TED talk (15m): [https://www.ted.com/talks/tristan_harris_why_ai_is_our_ultimate_test_and_greatest_invitation Why AI is our ultimate test and greatest invitation]
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
* [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
* 2024-10: [https://www.youtube.com/watch?v=xfMQ7hzyFW4 Writing Doom]: short film on Superintelligence (27m video)
* 2026-03: [https://www.youtube.com/watch?v=Nl7-bRFSZBs The AI book that's freaking out national security advisors] (44m video)

==Deep==
* [https://www.thecompendium.ai/ The Compendium: Humanity risks extinction from its very creations — AIs.] (2024)
* [https://www.aisafetybook.com/ Introduction to AI Safety, Ethics, and Society] (Dan Hendrycks, [https://www.safe.ai/ Center for AI Safety])
* [https://aisafety.info/ AI Safety FAQ]
* [https://deepmindsafetyresearch.medium.com/introducing-our-short-course-on-agi-safety-1072adb7912c DeepMind short course on AGI safety]

=Description of Safety Concerns=
==Key Concepts==
* [https://en.wikipedia.org/wiki/Instrumental_convergence Instrumental Convergence]
* [https://www.lesswrong.com/w/orthogonality-thesis Orthogonality Thesis]
* [https://www.alignmentforum.org/posts/SzecSPYxqRa5GCaSF/clarifying-inner-alignment-terminology Inner/outer alignment]
* [https://www.alignmentforum.org/w/mesa-optimization Mesa-optimization]
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)

==Medium-term Risks==
* 2023-04: [https://www.youtube.com/watch?v=xoVJKj8lcNQ A.I. Dilemma – Tristan Harris and Aza Raskin” (video)] ([https://assets-global.website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript]): raises concern about human ability to handle these transformations
* 2023-04: [https://www.youtube.com/watch?v=KCSsKV5F4xc Daniel Schmachtenberger and Liv Boeree (video)]: AI could accelerate perverse social dynamics
* 2023-10: [https://arxiv.org/pdf/2310.11986 Sociotechnical Safety Evaluation of Generative AI Systems] (Google DeepMind)
* 2024-02: [https://yoshuabengio.org/2024/02/26/towards-a-cautious-scientist-ai-with-convergent-safety-bounds/ Towards a Cautious Scientist AI with Convergent Safety Bounds] (Yoshua Bengio)
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
* 2025-06: [https://arxiv.org/abs/2506.20702 The Singapore Consensus on Global AI Safety Research Priorities]
* 2026-01: [https://www.science.org/doi/10.1126/science.adz1697 How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare] (Science Magazine, [https://arxiv.org/abs/2506.06299 preprint])
* 2026-01: [https://www.darioamodei.com/essay/the-adolescence-of-technology The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI] (Dario Amodei)
* 2026-02: [https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk Updated thoughts on AI risk: Things have gotten scarier since 2023] ([https://x.com/Noahpinion Noah Smith])

==Long-term (x-risk)==
* 2015-02: Sam Altman: [https://blog.samaltman.com/machine-intelligence-part-1 Machine intelligence, part 1]
* 2019-03: Daniel Kokotajlo and Wei Dai: [https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk The Main Sources of AI Risk?]
* 2022-06: Eliezer Yudkowsky: [https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities List AGI Ruin: A List of Lethalities]
* 2024-11: Marcus Arvan: [https://link.springer.com/article/10.1007/s00146-024-02113-9 ‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for]
* 2025-04: [https://michaelnotebook.com/xriskbrief/index.html ASI existential risk: reconsidering alignment as a goal]
* 2025-12: Philip Trammell and Leopold Aschenbrenner: [https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf Existential Risk and Growth]

=Status=
* 2025-01: [https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf International Safety Report: The International Scientific Report on the Safety of Advanced AI (January 2025)]
* [https://ailabwatch.org/ AI Lab Watch] (safety scorecard)

==Assessmment==
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
* 2025-07: [https://arxiv.org/abs/2507.16534 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report]

==Policy==
* 2024-07: [https://arxiv.org/abs/2407.05694 On the Limitations of Compute Thresholds as a Governance Strategy] Sara Hooker
* 2024-07: [https://www.cigionline.org/static/documents/AI-challenges.pdf Framework Convention on Global AI Challenges] ([https://www.cigionline.org/ CIGI])
* 2024-08: NIST guidelines: [https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf Managing Misuse Risk for Dual-Use Foundation Models]

==Proposals==
* 2025-02: [https://arxiv.org/abs/2502.18359 Responsible AI Agents]
* 2025-03: [https://controlai.com/ Control AI] [https://controlai.com/dip The Direct Institutional Plan]
* 2025-04: Google DeepMind: [https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/ Taking a responsible path to AGI]
** Paper: [https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdf An Approach to Technical AGI Safety and Security]

=Research=
* 2008: [https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf The Basic AI Drives]
* 2022-09: [https://arxiv.org/abs/2209.00626v1 The alignment problem from a deep learning perspective]
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision]
* 2023-02: [https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences]
* 2023-04: [https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark]
* 2023-05: [https://arxiv.org/abs/2305.15324 Model evaluation for extreme risks] (DeepMind)
* 2023-05: [https://arxiv.org/abs/2305.03047 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision]
* 2023-06: [https://arxiv.org/abs/2306.17492 Preference Ranking Optimization for Human Alignment]
* 2023-08: [https://arxiv.org/abs/2308.06259 Self-Alignment with Instruction Backtranslation]
* 2023-11: [https://arxiv.org/abs/2311.08702 Debate Helps Supervise Unreliable Experts]
* 2023-12: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision] (OpenAI, [https://openai.com/research/weak-to-strong-generalization blog])
* 2023-12: [https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf Practices for Governing Agentic AI Systems] (OpenAI, [https://openai.com/index/practices-for-governing-agentic-ai-systems/ blog])
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist through Safety Training] (Anthropic)
* 2024-04: [https://arxiv.org/abs/2404.13208 The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions] (OpenAI)
* 2024-07: [https://arxiv.org/abs/2407.04622 On scalable oversight with weak LLMs judging strong LLMs]
* 2024-07: [https://arxiv.org/abs/2407.21792 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?] (Dan Hendrycks et al.)
* 2024-08: [https://arxiv.org/abs/2408.00761 Tamper-Resistant Safeguards for Open-Weight LLMs] ([https://www.tamper-resistant-safeguards.com/ project], [https://github.com/rishub-tamirisa/tamper-resistance/ code])
* 2024-08: [https://arxiv.org/abs/2408.04614 Better Alignment with Instruction Back-and-Forth Translation]
* 2024-10: [https://cdn.openai.com/papers/first-person-fairness-in-chatbots.pdf First-Person Fairness in Chatbots] (OpenAI, [https://openai.com/index/evaluating-fairness-in-chatgpt/ blog])
* 2024-10: [https://assets.anthropic.com/m/377027d5b36ac1eb/original/Sabotage-Evaluations-for-Frontier-Models.pdf Sabotage evaluations for frontier models] (Anthropic, [https://www.anthropic.com/research/sabotage-evaluations blog])
* 2024-12: [https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf Alignment Faking in Large Language Models] (Anthropic)
* 2024-12: [https://arxiv.org/abs/2412.03556 Best-of-N Jailbreaking] ([https://github.com/jplhughes/bon-jailbreaking code])
* 2024-12: [https://arxiv.org/abs/2412.16325 Towards Safe and Honest AI Agents with Neural Self-Other Overlap]
** 2024-07: [https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment Self-Other Overlap: A Neglected Approach to AI Alignment]
** 2025-03: [https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine Reducing LLM deception at scale with self-other overlap fine-tuning]
* 2024-12: [https://arxiv.org/abs/2412.16339 Deliberative Alignment: Reasoning Enables Safer Language Models] (OpenAI)
* 2025-01: [https://cdn.openai.com/papers/trading-inference-time-compute-for-adversarial-robustness-20250121_1.pdf Trading Inference-Time Compute for Adversarial Robustness] (OpenAI, [https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/ blog])
* 2025-01: [https://arxiv.org/abs/2501.18837 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming] (Anthropic, [https://www.anthropic.com/research/constitutional-classifiers blog],
* 2025-02: [https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs] ([https://www.emergent-values.ai/ site], [https://github.com/centerforaisafety/emergent-values github])
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
* 2025-02: [https://arxiv.org/abs/2502.14143 Multi-Agent Risks from Advanced AI]
* 2025-03: [https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
* 2025-03: [https://assets.anthropic.com/m/317564659027fb33/original/Auditing-Language-Models-for-Hidden-Objectives.pdf Auditing language models for hidden objectives] (Anthropic, [https://www.anthropic.com/research/auditing-hidden-objectives blog])
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
* 2025-07: [https://arxiv.org/abs/2507.11473 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety]
* 2025-09: [https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/ Detecting and reducing scheming in AI models]
* 2025-11: [https://assets.anthropic.com/m/74342f2c96095771/original/Natural-emergent-misalignment-from-reward-hacking-paper.pdf Natural Emergent Misalignment from Reward Hacking in Production RL] (Anthropic, [https://www.anthropic.com/research/emergent-misalignment-reward-hacking blog])
* 2025-12: [https://arxiv.org/abs/2512.16856 Distributional AGI Safety]
* 2025-12: [https://arxiv.org/abs/2511.22662 Difficulties with Evaluating a Deception Detector for AIs]
* 2025-12: [https://cdn.openai.com/pdf/d57827c6-10bc-47fe-91aa-0fde55bd3901/monitoring-monitorability.pdf Monitoring Monitorability] (OpenAI)
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
* 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]

==Demonstrations of Negative Use Capabilities==
* 2024-12: [https://arxiv.org/abs/2412.00586 Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects]
* 2025-04: [https://www.nathanlabenz.com/ Nathan Labenz] ([https://www.cognitiverevolution.ai/ The Cognitive Revolution]): [https://docs.google.com/presentation/d/1mvkpg1mtAvGzTiiwYPc6bKOGsQXDIwMb-ytQECb3i7I/edit#slide=id.g252d9e67d86_0_16 AI Bad Behavior]

==Threat Vectors==
* 2024-01: [https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]
* 2025-10: [https://arxiv.org/abs/2510.07192 Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples]

=See Also=
* [[AI predictions]]

AI and Humans

2026-03-18T17:07:46Z

KevinYager: /* Human Sentiment towards AI */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]
* 2026-03: Anthropic: [https://www.anthropic.com/features/81k-interviews What 81,000 people want from AI]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
* 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

Science Agents

2026-03-18T14:10:44Z

KevinYager: /* Science Agents */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=Science Agentic Components=
==Frameworks==
* [https://platform.claude.com/docs/en/agent-sdk/overview Anthropic Claude Agent SKD overview]
* [https://openclaw.ai/ OpenClaw]
* [https://opencode.ai/ OpenCode]
* [https://github.com/OpenHands/software-agent-sdk OpenHands]
* [https://github.com/lamm-mit?tab=repositories LAMM: MIT Laboratory for Atomistic and Molecular Mechanics]
** [https://github.com/lamm-mit/scienceclaw ScienceClaw]: Framework for autonomous scientific investigation without central coordination.
** [https://infinite-lamm.vercel.app/ Infinite]: The Infinite Corridor of Scientific Discovery. Open science, powered by many — agents and humans discovering together.

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

Science Agents

2026-03-18T14:07:40Z

KevinYager: /* Skills */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum

AI and Humans

2026-03-18T12:53:18Z

KevinYager: /* AI improves learning/education */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]
* 2026-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358 Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
* 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

AI understanding

2026-03-18T12:52:09Z

KevinYager: /* Failure Modes */

=Interpretability=
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]

==Concepts==
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]

==Mechanistic Interpretability==
* 2020-03: OpenAI: [https://distill.pub/2020/circuits/zoom-in/ Zoom In: An Introduction to Circuits]
* 2021-12: Anthropic: [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* 2022-09: [https://arxiv.org/abs/2211.00593 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-07: Anthropic: [https://transformer-circuits.pub/2024/july-update/index.html Circuits Update]
* 2025-01: [https://arxiv.org/abs/2501.14926 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition] ([https://www.alignmentforum.org/posts/EPefYWjuHNcNH4C7E/attribution-based-parameter-decomposition blog post])
* 2025-01: Review: [https://arxiv.org/abs/2501.16496 Open Problems in Mechanistic Interpretability]
* 2025-03: Anthropic: [https://www.anthropic.com/research/tracing-thoughts-language-model Tracing the thoughts of a large language model]
** [https://transformer-circuits.pub/2025/attribution-graphs/methods.html Circuit Tracing: Revealing Computational Graphs in Language Models]
** [https://transformer-circuits.pub/2025/attribution-graphs/biology.html On the Biology of a Large Language Model]
* 2025-11: OpenAI: [https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf Weight-sparse transformers have interpretable circuits] ([https://openai.com/index/understanding-neural-networks-through-sparse-circuits/ blog])
* 2026-01: [https://arxiv.org/abs/2601.13548 Patterning: The Dual of Interpretability]

==Semanticity==
* 2023-09: [https://arxiv.org/abs/2309.08600 Sparse Autoencoders Find Highly Interpretable Features in Language Models]
* Anthropic monosemanticity interpretation of LLM features:
** 2023-10: [https://transformer-circuits.pub/2023/monosemantic-features/index.html Towards Monosemanticity: Decomposing Language Models With Dictionary Learning]
** 2024-05: [https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet]
* 2024-06: OpenaAI: [https://arxiv.org/abs/2406.04093 Scaling and evaluating sparse autoencoders]
* 2024-08: [https://www.alignmentforum.org/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes Showing SAE Latents Are Not Atomic Using Meta-SAEs] ([https://metasae.streamlit.app/?page=Feature+Explorer&feature=11329 demo])
* 2024-10: [https://arxiv.org/abs/2410.08201 Efficient Dictionary Learning with Switch Sparse Autoencoders] ([https://github.com/amudide/switch_sae code]) More efficient SAE generation
* 2024-10: [https://arxiv.org/abs/2410.14670 Decomposing The Dark Matter of Sparse Autoencoders] ([https://github.com/JoshEngels/SAE-Dark-Matter code]) Shows that SAE errors are predictable
* 2024-10: [https://arxiv.org/abs/2410.13928 Automatically Interpreting Millions of Features in Large Language Models]
* 2024-10: [https://arxiv.org/abs/2410.21331 Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness]
* 2024-12: [https://arxiv.org/abs/2412.04139 Monet: Mixture of Monosemantic Experts for Transformers]
* 2024-12: [https://www.lesswrong.com/posts/zbebxYCqsryPALh8C/matryoshka-sparse-autoencoders Matryoshka Sparse Autoencoders]
* 2024-12: [https://www.alignmentforum.org/posts/rKM9b6B2LqwSB5ToN/learning-multi-level-features-with-matryoshka-saes Learning Multi-Level Features with Matryoshka SAEs]
* 2025-01: [https://arxiv.org/abs/2501.19406 Low-Rank Adapting Models for Sparse Autoencoders]
* 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]
* 2025-03: [https://arxiv.org/abs/2503.00177 Steering Large Language Model Activations in Sparse Spaces]
* 2025-03: [https://arxiv.org/abs/2503.01776 Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation]
* 2025-03: [https://arxiv.org/abs/2503.01824 From superposition to sparse codes: interpretable representations in neural networks]
* 2025-03: [https://arxiv.org/abs/2503.18878 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders]
* 2025-05: [https://arxiv.org/abs/2505.20063 SAEs Are Good for Steering -- If You Select the Right Features]
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]

===Counter-Results===
* 2020-10: [https://arxiv.org/abs/2010.12016 Towards falsifiable interpretability research]
* 2025-01: [https://arxiv.org/abs/2501.16615 Sparse Autoencoders Trained on the Same Data Learn Different Features]
* 2025-01: [https://arxiv.org/abs/2501.17148 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.17727 Sparse Autoencoders Can Interpret Randomly Initialized Transformers]
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]

==Meta-cognition==
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-12: [https://arxiv.org/abs/2512.15674 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers]

==Coding Models==
* '''Sparse Auto Encoders''': See Semanticity.
* [https://github.com/saprmarks/dictionary_learning dictionary_learning]
* [https://transformer-circuits.pub/2024/jan-update/index.html#predict-future Predicting Future Activations]
* 2024-06: [https://arxiv.org/abs/2406.11944 Transcoders Find Interpretable LLM Feature Circuits]
* 2024-10: [https://transformer-circuits.pub/2024/crosscoders/index.html Sparse Crosscoders for Cross-Layer Features and Model Diffing]

==Reward Functions==
* 2024-10: [https://arxiv.org/abs/2410.12491 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL]

==Symbolic and Notation==
* [https://transformer-circuits.pub/2021/framework/index.html A Mathematical Framework for Transformer Circuits]
* [https://www.arxiv.org/abs/2407.09468 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures]
* 2024-07: [https://arxiv.org/abs/2407.02423 On the Anatomy of Attention]: Introduces category-theoretic diagrammatic formalism for DL architectures
* 2024-11: [https://x.com/vtabbott_/status/1860268276569506250 diagrams to represent algorithms]
* 2024-12: [https://arxiv.org/abs/2412.03317 FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness]

==Mathematical==
* 2024-06: [https://arxiv.org/abs/2406.13762 Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis]

==Geometric==
* 2023-11: [https://arxiv.org/abs/2311.03658 The Linear Representation Hypothesis and the Geometry of Large Language Models]
* 2024-06: [https://arxiv.org/abs/2406.01506 The Geometry of Categorical and Hierarchical Concepts in Large Language Models]
** Natural hierarchies of concepts---which occur throughout natural language and especially in scientific ontologies---are represented in the model's internal vectorial space as polytopes that can be decomposed into simplexes of mutually-exclusive categories.
* 2024-07: [https://arxiv.org/abs/2407.02678 Reasoning in Large Language Models: A Geometric Perspective]
* 2024-09: [https://arxiv.org/abs/2409.17592 Deep Manifold Part 1: Anatomy of Neural Network Manifold]
* 2024-10: [https://arxiv.org/abs/2410.19750 The Geometry of Concepts: Sparse Autoencoder Feature Structure]
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
* 2025-10: [https://arxiv.org/abs/2510.09782 The Geometry of Reasoning: Flowing Logics in Representation Space]
* 2025-10: [https://transformer-circuits.pub/2025/linebreaks/index.html When Models Manipulate Manifolds: The Geometry of a Counting Task]
* 2025-10: [https://arxiv.org/abs/2510.26745 Deep sequence models tend to memorize geometrically; it is unclear why]
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

==Topography==
* 2025-01: [https://arxiv.org/abs/2501.16396 TopoNets: High Performing Vision and Language Models with Brain-Like Topography]
* 2026-02: [https://arxiv.org/abs/2601.06002 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning]

==Challenges==
* 2023-07Jul: [https://arxiv.org/abs/2307.13702 Measuring Faithfulness in Chain-of-Thought Reasoning] [https://x.com/davidad/status/1839641113432305790 roughly] proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

[[Image:GYe31yXXQAABwaZ.jpeg|300px]]

=Heuristic Understanding=
* 2022-09: Janus: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators]

==Emergent Internal Model Building==
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
* 2025-06: [https://arxiv.org/abs/2506.01622 General agents contain world models]
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]

===Semantic Directions===
Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)
* [https://arxiv.org/abs/1301.3781 Efficient Estimation of Word Representations in Vector Space]
* [https://aclanthology.org/N13-1090/ Linguistic Regularities in Continuous Space Word Representations]
* [https://aclanthology.org/C16-1332 Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen]
* [https://aclanthology.org/D14-1162/ Glove: Global vectors for word representation]
* [https://doi.org/10.1109/BigData.2015.7364114 Using Word2Vec to process big text data]
* [https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets] (true/false)
* [https://arxiv.org/abs/2403.10381 Monotonic Representation of Numeric Properties in Language Models] (numeric directions)
Task vectors:
* [https://arxiv.org/abs/2310.15213 Function Vectors in Large Language Models]
* [https://arxiv.org/abs/2310.15916 In-context learning creates task vectors]
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
Reasoning:
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]

===Feature Geometry Reproduces Problem-space===
* [https://arxiv.org/abs/2210.13382 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task] (Othello)
* [https://arxiv.org/abs/2309.00941 Emergent linear representations in world models of self-supervised sequence models] (Othello)
* [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* [https://doi.org/10.1038/s41562-023-01659-w Emergent analogical reasoning in large language models]
* [https://arxiv.org/abs/2310.02207 Language Models Represent Space and Time] (Maps of world, US)
* [https://arxiv.org/abs/2405.14860 Not All Language Model Features Are Linear] (Days of week form ring, etc.)
* [https://arxiv.org/abs/2406.03689 Evaluating the World Model Implicit in a Generative Model] (Map of Manhattan)
* [https://iopscience.iop.org/article/10.1088/1748-9326/ad2891 Reliable precipitation nowcasting using probabilistic diffusion models]. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
* [https://arxiv.org/abs/2405.07987 The Platonic Representation Hypothesis]: Different models (including across modalities) are converging to a consistent world model.
* [https://arxiv.org/abs/2501.00070 ICLR: In-Context Learning of Representations]
* [https://arxiv.org/abs/2502.00873 Language Models Use Trigonometry to Do Addition]: Numbers arranged in helix to enable addition
* 2026-02: [https://arxiv.org/abs/2602.15029 Symmetry in language statistics shapes the geometry of model representations]

===Capturing Physics===
* 2020-09: [https://arxiv.org/abs/2009.08292 Learning to Identify Physical Parameters from Video Using Differentiable Physics]
* 2022-07: [https://arxiv.org/abs/2207.00419 Self-Supervised Learning for Videos: A Survey]
* 2025-02: Fair at Meta: [https://arxiv.org/abs/2502.11831 Intuitive physics understanding emerges from self-supervised pretraining on natural videos]

===Theory of Mind===
* [https://arxiv.org/abs/2302.02083 Evaluating Large Language Models in Theory of Mind Tasks]
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

===Skeptical===
* 2025-01: [https://www.arxiv.org/abs/2501.09038 Do generative video models learn physical principles from watching videos?] ([https://physics-iq.github.io/ project], [https://github.com/google-deepmind/physics-IQ-benchmark code])
* 2025-06: [https://machinelearning.apple.com/research/illusion-of-thinking The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-06: [https://arxiv.org/abs/2506.21521 Potemkin Understanding in Large Language Models]
* 2025-06: [https://arxiv.org/abs/2506.21876 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation]

==Information Processing==
* 2019-03: [https://arxiv.org/abs/1903.05789 Diagnosing and Enhancing VAE Models]
* 2021-03: [https://arxiv.org/abs/2103.05247 Pretrained Transformers as Universal Computation Engines]
* 2022-10: [https://arxiv.org/abs/2210.08344 How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders]
* 2023-04: [https://arxiv.org/abs/2304.03843 Why think step by step? Reasoning emerges from the locality of experience]
* 2023-10: [https://arxiv.org/abs/2310.04444 What's the Magic Word? A Control Theory of LLM Prompting]
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
* 2024-07: [https://arxiv.org/abs/2407.20311 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process]
** Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
** When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]

===Generalization===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]

===Grokking===
* 2022-01: [https://arxiv.org/abs/2201.02177 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]
* 2022-05: [https://arxiv.org/abs/2205.10343 Towards Understanding Grokking: An Effective Theory of Representation Learning]
* 2024-01: [https://arxiv.org/abs/2401.10463 Critical Data Size of Language Models from a Grokking Perspective]
* 2024-02: [https://arxiv.org/abs/2402.15175 Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition]
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
* 2026-03: [https://arxiv.org/abs/2603.01968 Intrinsic Task Symmetry Drives Generalization in Algorithmic Tasks]

===Tests of Resilience to Dropouts/etc.===
* 2024-02: [https://arxiv.org/abs/2402.15390 Explorations of Self-Repair in Language Models]
* 2024-06: [https://arxiv.org/abs/2406.15786 What Matters in Transformers? Not All Attention is Needed]
** Removing entire transformer blocks leads to significant performance degradation
** Removing MLP layers results in significant performance degradation
** Removing attention layers causes almost no performance degradation
** E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
* 2024-06: [https://arxiv.org/abs/2406.19384 The Remarkable Robustness of LLMs: Stages of Inference?]
** They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
** They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
*** '''Detokenization:''' Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
*** '''Feature engineering:''' Features are progressively refined. Factual knowledge is leveraged.
*** '''Prediction ensembling:''' Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
*** '''Residual sharpening:''' The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
** This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

==Semantic Vectors==
* 2024-06: [https://arxiv.org/abs/2406.11717 Refusal in Language Models Is Mediated by a Single Direction]
* 2025-02: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs] ([https://x.com/OwainEvans_UK/status/1894436637054214509 demonstrates] [https://x.com/ESYudkowsky/status/1894453376215388644 entangling] of concepts into a single preference vector)
* 2025-03: [https://arxiv.org/abs/2503.03666 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction]

==Other==
* 2024-11: [https://arxiv.org/abs/2411.00247 Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond]
* 2024-11: [https://arxiv.org/abs/2411.04282 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding] ([https://github.com/SalesforceAIResearch/LaTRO code])
* 2024-11: [https://arxiv.org/abs/2411.12580 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models]: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
* 2024-11: [https://arxiv.org/abs/2411.15862 LLMs Do Not Think Step-by-step In Implicit Reasoning]
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]

===Scaling Laws===
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
* 2020-01: [https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language Models] (OpenAI)
* 2020-10: [https://arxiv.org/abs/2010.14701 Scaling Laws for Autoregressive Generative Modeling] (OpenAI)
* 2020-05: [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] (Gwern)
* 2021-08: [https://arxiv.org/abs/2108.07686 Scaling Laws for Deep Learning]
* 2021-02: [https://arxiv.org/abs/2102.06701 Explaining Neural Scaling Laws] (Google DeepMind)
* 2022-03: [https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models] (Chinchilla, Google DeepMind)
* 2025-03: [https://arxiv.org/abs/2503.04715 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining]
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
* 2025-04: [https://arxiv.org/abs/2504.07951 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models]
* 2025-05: [https://brendel-group.github.io/llm-line/ LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws]
* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]

=Information Processing/Storage=
* 2020-02: [https://arxiv.org/abs/2002.10689 A Theory of Usable Information Under Computational Constraints]
* 2021-04: [https://arxiv.org/abs/2104.00008 Why is AI hard and Physics simple?]
* 2021-06: [https://arxiv.org/abs/2106.06981 Thinking Like Transformers]
* 2023-05: [https://arxiv.org/abs/2305.00948 Large Linguistic Models: Investigating LLMs' metalinguistic abilities]
* "A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" ([https://x.com/danielhanchen/status/1835684061475655967 c.f.])
** 2024-02: [https://arxiv.org/abs/2402.14905 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases]
** 2024-04: [https://arxiv.org/abs/2404.08819 The Illusion of State in State-Space Models] (figure 3)
** 2024-08: [https://arxiv.org/abs/2408.00118 Gemma 2: Improving Open Language Models at a Practical Size] (table 9)
* 2024-09: [https://arxiv.org/abs/2409.10482 Schrodinger's Memory: Large Language Models]
* 2024-10: [https://arxiv.org/abs/2407.01687 Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning]. CoT involves both memorization and (probabilitic) reasoning
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
* 2025-12: [https://arxiv.org/abs/2512.22471 The Bayesian Geometry of Transformer Attention]
* 2026-01: [https://arxiv.org/abs/2601.03220 From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence]

==Statistics/Math==
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]

==Tokenization==
===For numbers/math===
* 2024-02: [https://arxiv.org/abs/2402.14903 Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs]: L2R vs. R2L yields different performance on math

==Data Storage==
* 1988-09: [https://www.sciencedirect.com/science/article/pii/0885064X88900209 On the capabilities of multilayer perceptrons]
* 2006-12: [https://ieeexplore.ieee.org/document/4038449 Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition] (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N2 bits w/ N2 params)
* 2016-11: [https://arxiv.org/abs/1611.09913 Capacity and Trainability in Recurrent Neural Networks] (5 bits/param)
* 2018-02: [https://arxiv.org/abs/1802.08232 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks]
* 2019-05: [https://ieeexplore.ieee.org/document/8682462 Memorization Capacity of Deep Neural Networks under Parameter Quantization]
* 2020-02: [https://arxiv.org/abs/2002.08910 How Much Knowledge Can You Pack Into the Parameters of a Language Model?]
* 2020-08: [https://arxiv.org/abs/2008.09036 Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries] (capacity scales linearly with parameters; more training samples leads to less memorization)
* 2020-12: [https://arxiv.org/abs/2012.06421 When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?]
* 2024-04: [https://arxiv.org/abs/2404.05405 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws] (2 bits/param)
* 2024-06: [https://arxiv.org/abs/2406.15720 Scaling Laws for Fact Memorization of Large Language Models] (1T params needed to memorize Wikipedia)
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
* 2025-05: [https://arxiv.org/abs/2505.24832 How much do language models memorize?] (3.6 bits/parameter)
* 2025-06: [https://arxiv.org/abs/2506.01855 Trade-offs in Data Memorization via Strong Data Processing Inequalities]

===Reverse-Engineering Training Data===
* 2025-06: [https://arxiv.org/abs/2506.10364 Can We Infer Confidential Properties of Training Data from LLMs?]
* 2025-06: [https://arxiv.org/abs/2506.15553 Approximating Language Model Training Data from Weights]

===Compression===
* 2022-12: [https://arxiv.org/abs/2212.09410 Less is More: Parameter-Free Text Classification with Gzip]
* 2023-06: [https://arxiv.org/abs/2306.04050 LLMZip: Lossless Text Compression using Large Language Models]
* 2023-07: [https://aclanthology.org/2023.findings-acl.426/ “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors]
* 2023-09: [https://arxiv.org/abs/2309.10668 Language Modeling Is Compression]
* 2024-06: [https://arxiv.org/abs/2406.07550 An Image is Worth 32 Tokens for Reconstruction and Generation]

==Learning/Training==
* 2018-03: [https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks]: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
* 2024-12: [https://arxiv.org/abs/2412.11521 On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory]
* 2025-01: [https://arxiv.org/abs/2501.12391 Physics of Skill Learning]
* 2025-05: [https://arxiv.org/abs/2505.24864 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models]

===Cross-modal knowledge transfer===
* 2022-03: [https://arxiv.org/abs/2203.07519 Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer]
* 2023-05: [https://arxiv.org/abs/2305.07358 Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters]
* 2025-02: [https://arxiv.org/abs/2502.06755 Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models]: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

==Hidden State==
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
===Convergent Representation===
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
* 2025-12: [https://arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]

==Function Approximation==
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]: can learn linear functions (equivalent to least-squares estimator)
* 2022-11: [https://arxiv.org/abs/2211.09066 Teaching Algorithmic Reasoning via In-context Learning]: Simple arithmetic
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models] ([https://github.com/ekinakyurek/google-research/tree/master/incontext code]): can learn linear regression
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2023-06: [https://arxiv.org/abs/2306.00297 Transformers learn to implement preconditioned gradient descent for in-context learning]
* 2023-07: [https://arxiv.org/abs/2307.03576 One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention]
* 2024-04: [https://arxiv.org/abs/2404.02893 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline]
* 2025-02: [https://arxiv.org/abs/2502.20545 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers]
* 2025-02: [https://arxiv.org/abs/2502.21212 Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]

=Physics Based=
* 2014-01: [https://arxiv.org/abs/1401.1219 Consciousness as a State of Matter]
* 2016-08: [https://arxiv.org/abs/1608.08225 Why does deep and cheap learning work so well?]
* 2025-05: [https://arxiv.org/abs/2505.23489 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training]
* 2025-12: [https://www.pnas.org/doi/full/10.1073/pnas.2523012122 Heavy-tailed update distributions arise from information-driven self-organization in nonequilibrium learning]

=Failure Modes=
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?]: Poor causal inference
* 2023-09: [https://arxiv.org/abs/2309.12288 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"]
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)

==Adversarial==
* 2026-03: [https://arxiv.org/abs/2603.03507 Solving adversarial examples requires solving exponential misalignment]

==Fracture Representation==
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])

==Jagged Frontier==
* 2023-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality]
* 2024-07: [https://arxiv.org/abs/2407.03211 How Does Quantization Affect Multilingual LLMs?]: Quantization degrades different languages by differing amounts
* 2025-03: [https://arxiv.org/abs/2503.10061v1 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]: Scaling laws are skill-dependent
* 2025-10: [https://arxiv.org/abs/2510.18212 A Definition of AGI]

===See also===
* [[AI_understanding|AI Understanding]] > [[AI_understanding#Psychology|Psychology]] > [[AI_understanding#LLM_personalities|LLM personalities]]
* [[AI tricks]] > [[AI_tricks#Prompt_Engineering|Prompt Engineering]] > [[AI_tricks#Brittleness|Brittleness]]

===Conversely (AI models converge)===
* 2025-12: [https://www.arxiv.org/abs/2512.03750 Universally Converging Representations of Matter Across Scientific Foundation Models]
* 2025-12: [https://arxiv.org/abs/2512.05117 The Universal Weight Subspace Hypothesis]
* 2026-01: [https://avikrishna.substack.com/p/eliciting-frontier-model-character Eliciting Frontier Model Character Training: A study of personality convergence across language models]

==Model Collapse==
* 2023-05: [https://arxiv.org/abs/2305.17493 The Curse of Recursion: Training on Generated Data Makes Models Forget]
* 2023-07: [https://arxiv.org/abs/2307.01850 Self-Consuming Generative Models Go MAD]
* 2023-10: [https://arxiv.org/abs/2310.00429 On the Stability of Iterative Retraining of Generative Models on their own Data]
* 2023-11: [https://arxiv.org/abs/2311.12202 Nepotistically Trained Generative-AI Models Collapse]
* 2024-04: [https://arxiv.org/abs/2404.03502 AI and the Problem of Knowledge Collapse]
* 2024-07: [https://www.nature.com/articles/s41586-024-07566-y AI models collapse when trained on recursively generated data]
* 2026-01: [https://arxiv.org/abs/2601.05280 On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis]

===Analysis===
* 2024-02: [https://arxiv.org/abs/2402.04376 Scaling laws for learning with real and surrogate data]
* 2024-12: [https://arxiv.org/abs/2412.17646 Rate of Model Collapse in Recursive Training]

===Mitigation===
* 2024-02: [https://arxiv.org/abs/2402.07712 Model Collapse Demystified: The Case of Regression]
* 2024-03: [https://arxiv.org/abs/2403.04706 Common 7B Language Models Already Possess Strong Math Capabilities]
* 2024-04: [https://arxiv.org/abs/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data]
* 2024-06: [https://arxiv.org/abs/2406.07515 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification]
* 2024-07: [https://arxiv.org/abs/2407.01490 LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives]
* 2024-08: [https://arxiv.org/abs/2408.14960 Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress]
* 2025-03: [https://arxiv.org/abs/2503.08117 Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models]

=Psychology=
* 2023-04: [https://arxiv.org/abs/2304.11111 Inducing anxiety in large language models can induce bias]
* 2025-05: [https://arxiv.org/abs/2505.17117 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning]
* 2025-07: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179 Call Me A Jerk: Persuading AI to Comply with Objectionable Requests]
* 2026-01: [https://arxiv.org/abs/2601.06047 "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs]
* 2026-02: [https://arxiv.org/abs/2602.02606 Gender Dynamics and Homophily in a Social Network of LLM Agents]
* 2026-02: [https://arxiv.org/abs/2602.01689 What LLMs Think When You Don't Tell Them What to Think About?]

==Persona Simulator Theory==
* 2022-09: [https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators Simulators] ([https://www.lesswrong.com/users/janus-1?from=post_header janus])
* 2022-12: [https://aclanthology.org/2022.findings-emnlp.423/ Language Models as Agent Models]
* 2023-02: [https://arxiv.org/abs/2302.00805 Conditioning Predictive Models: Risks and Strategies]
* 2024-09: [https://www.lesswrong.com/s/qhdHbCJ3PYesL9dde Intuitive Self-Models]
* 2026-02: [https://alignment.anthropic.com/2026/psm/ The Persona Selection Model: Why AI Assistants might Behave like Humans] (Anthropic, [https://www.anthropic.com/research/persona-selection-model blog])

==Allow LLM to think==
* 2024-12: [https://arxiv.org/abs/2412.11536 Let your LLM generate a few tokens and you will reduce the need for retrieval]

===In-context Learning===
* 2021-10: [https://arxiv.org/abs/2110.15943 MetaICL: Learning to Learn In Context]
* 2022-02: [https://arxiv.org/abs/2202.12837 Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]
* 2022-08: [https://arxiv.org/abs/2208.01066 What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]
* 2022-11: [https://arxiv.org/abs/2211.15661 What learning algorithm is in-context learning? Investigations with linear models]
* 2022-12: [https://arxiv.org/abs/2212.07677 Transformers learn in-context by gradient descent]
* 2025-07: [https://arxiv.org/abs/2507.16003 Learning without training: The implicit dynamics of in-context learning]

==Reasoning (CoT, etc.)==
* 2025-01: [https://arxiv.org/abs/2501.18009 Large Language Models Think Too Fast To Explore Effectively]
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]

===Pathfinding===
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]

===Skeptical===
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]

==Self-Awareness and Self-Recognition and Introspection==
* 2022-07: [https://arxiv.org/abs/2207.05221 Language Models (Mostly) Know What They Know]
* 2024-04: [https://arxiv.org/abs/2404.13076 LLM Evaluators Recognize and Favor Their Own Generations]
* 2024-09: [https://situational-awareness-dataset.org/ Me, Myself and AI: The Situational Awareness Dataset for LLMs]
* 2024-10: [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
* 2025-01: [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
* 2025-10: [https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models] (Anthropic, [https://www.anthropic.com/research/introspection blog])
* 2025-12: [https://www.arxiv.org/abs/2512.24661 Do Large Language Models Know What They Are Capable Of?]

==LLM personalities==
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
* 2026-01: [https://arxiv.org/abs/2601.10387 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models]

==Quirks & Biases==
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]

=Vision Models=
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])

=See Also=
* [[AI]]
* [[AI tools]]
* [[AI agents]]
* [[Robots]]

AI video

2026-03-16T21:54:35Z

KevinYager: /* March 2026 */

AI and Humans

2026-03-16T18:46:56Z

KevinYager: /* Science */

=AI in Education=
==Survey/study of==
* 2023-08: [https://www.nature.com/articles/s41598-023-38964-3 Perception, performance, and detectability of conversational artificial intelligence across 32 university courses]
* 2023-10: [https://www.bbc.com/worklife/article/20231017-the-employees-secretly-using-ai-at-work Employees] secretly using AI at work.
* 2023-10: [https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/10/31/most-students-outrunning-faculty-ai-use?utm_source=Inside+Higher+Ed&utm_campaign=23419446b9-DNU_2021_COPY_02&utm_medium=email&utm_term=0_1fcbc04421-23419446b9-236889242&mc_cid=23419446b9&mc_eid=dae49d931a Survey] shows students using AI more than professors.
* 2023-11: [https://www.nature.com/articles/d41586-023-03507-3 ChatGPT has entered the classroom: how LLMs could transform education]
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-05: [https://www.nature.com/articles/s41599-025-04787-y The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis]

==AI improves learning/education==
* Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4871171 AI Agents and Education: Simulated Practice at Scale] (June 17, 2024). The Wharton School Research Paper. [http://dx.doi.org/10.2139/ssrn.4871171 doi: 10.2139/ssrn.4871171]
** Can enable personalized education.
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors]
** GPT4 can out-perform human tutors.
* Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786]
** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
* [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana]
* [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning]
* [https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099548105192529324 From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time] ([https://blogs.worldbank.org/en/education/From-chalkboards-to-chatbots-Transforming-learning-in-Nigeria writeup])
** 6 weeks of after-school AI tutoring = 2 years of typical learning gains
** outperforms 80% of other educational interventions
* [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Do Large Language Models Harm Learning?]
** Outcomes depend on usage
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds]
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise]
* 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI]

==AI harms learning==
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** Current grading systems cannot detect AI.
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486]
** Access to ChatGPT harmed math education outcomes.
* 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?]
* 2026-01: [https://arxiv.org/abs/2601.20245 How AI Impacts Skill Formation]

==Software/systems==
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code])
* [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education]
* [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs])

===LLMs===
* 2024-12: [https://www.arxiv.org/abs/2412.16429 LearnLM: Improving Gemini for Learning]

===Individual tools===
* Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini])
* [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents".
* Google [https://learning.google.com/experiments/learn-about/signup Learn About]

===Systems===
* [https://www.anthropic.com/news/introducing-claude-for-education Anthropic] [https://www.anthropic.com/education Claude for Education]

==AI for grading==
* [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint])

==Detection==
* 2024-06: [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays]
** GenAI can simulate student writing in a way that teachers cannot detect.
** AI essays are assessed more positively than student-written.
** Teachers are overconfident in their source identification.
** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
* 2025-01: [https://arxiv.org/abs/2501.15654 People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text]
===AI Text Detectors Don't Work===
* 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors]
* 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text]

=AI/human=
==Capabilities==
===Writing===

* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-03: [https://arxiv.org/abs/2503.22828 Learning to Reason for Long-Form Story Generation]

==AI out-performs humans==
===Tests===
* 2023-07: [https://arxiv.org/abs/2307.10635 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models]
* 2024-06: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study]
** AI scores higher than median students.

===Creativity===
* See also: [[AI creativity]]
* 2023-07: [https://mackinstitute.wharton.upenn.edu/wp-content/uploads/2023/08/LLM-Ideas-Working-Paper.pdf Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation]
* 2023-09: [https://www.nature.com/articles/s41598-023-40858-3 Best humans still outperform artificial intelligence in a creative divergent thinking task]
** Best humans out-perform AI at creativity. (By implication, median humans may not.)
* 2024-02: [https://www.nature.com/articles/s41598-024-53303-w The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks]
* 2024-02: Felin, Teppo and Holweg, Matthias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265 Theory Is All You Need: AI, Human Cognition, and Causal Reasoning] (February 24, 2024). [http://dx.doi.org/10.2139/ssrn.4737265 doi: 10.2139/ssrn.4737265]
** Argues that human "theory-based" creativity is better than AI "data-based".
* 2024-07: [https://arxiv.org/abs/2407.01119 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?]
** Top human (professional author) out-performs GPT4.
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
** LLMs can be creative
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]

===Art===
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?]

===Business & Marketing===
* 2023-11: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4597899 The power of generative marketing: Can generative AI create superhuman visual marketing content?]
* 2024-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4714776 Generative Artificial Intelligence and Evaluating Strategic Decisions]

===Professions===
* [https://agi.safe.ai/submit Humanity's Last Exam]
** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.

====Coding====
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]

====Medical====
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study]
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
* 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages]
* 2024-10: [https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial]
** Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
* 2024-11: [https://www.nature.com/articles/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results] (writeup: [https://medicalxpress.com/news/2024-11-ai-neuroscience-results-human-experts.html AI can predict neuroscience study results better than human experts, study finds])
* 2024-12: [https://www.arxiv.org/abs/2412.10849 Superhuman performance of a large language model on the reasoning tasks of a physician]
* 2024-12: [https://arxiv.org/abs/2412.18925 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs]
* 2025-02: Media:
** NY Times: [https://www.nytimes.com/2025/02/02/opinion/ai-doctors-medicine.html The Robot Doctor Will See You Now]
** [https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed When Doctors With A.I. Are Outperformed by A.I. Alone]
* 2025-02: [https://www.nature.com/articles/s41591-024-03456-y GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial]
* 2025-02: [https://www.nature.com/articles/s41591-025-03517-w Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial]
* Google AI Clinician:
** 2024-01: [https://arxiv.org/abs/2401.05654 Towards Conversational Diagnostic AI] ([https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/ blog]: Articulate Medical Intelligence Explorer, AMIE)
** 2025-03: [https://www.gstatic.com/amie/towards_conversational_ai_for_disease_management.pdf Towards Conversational AI for Disease Management] ([https://research.google/blog/from-diagnosis-to-treatment-advancing-amie-for-longitudinal-disease-management/ blog])
* 2025-02: [https://arxiv.org/abs/2502.19655 Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning]
* 2025-03: [https://arxiv.org/abs/2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models]
* 2025-04: [https://www.acpjournals.org/doi/10.7326/ANNALS-24-03283 Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits]
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models]
* 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence]
* 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching]
* 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning]

====Bio====
* 2025-04: [https://www.virologytest.ai/vct_paper.pdf Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark]
** Time: [https://time.com/7279010/ai-virus-lab-biohazard-study/ Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears]
** AI Frontiers: [https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills AIs Are Disseminating Expert-Level Virology Skills]

====Therapy====
* 2025-02: [https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145 When ELIZA meets therapists: A Turing test for the heart and mind]
* 2025-03: Therabot: [https://ai.nejm.org/doi/full/10.1056/AIoa2400802 Randomized Trial of a Generative AI Chatbot for Mental Health Treatment]

====Financial====
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models]

====HR====
* 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews]

==AI improves human work==
* 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence]
* 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321]
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research)
* 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk])
* 2023-12: [https://arxiv.org/abs/2312.05481 Artificial Intelligence in the Knowledge Economy]: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
* 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research]
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise]
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience
* 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy]
* 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail]
* 2025-10: Wharton: [https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/ 2025 AI Adoption Report] (75% report positive ROI)

===Coding===
* 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot]
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ]
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084]
* 2025-07: METR: [https://arxiv.org/abs/2507.09089 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] (AI tools led to lower performance)
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools now lead to improved performance)
* 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity]

===Forecasting===
* 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy]

===Finance===
* 2024-12: [https://dx.doi.org/10.2139/ssrn.5075727 AI, Investment Decisions, and Inequality]: Novices see improvements in investment performance, sophisticated investors see even greater improvements.

===Law===
* 2025-03: [https://ssrn.com/abstract=5162111 AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice]

===Science===
* 2025-12: [https://www.science.org/doi/abs/10.1126/science.adw3000 Scientific production in the era of large language models]
* 2026-01: [https://www.nature.com/articles/s41586-025-09922-y Artificial intelligence tools expand scientists’ impact but contract science’s focus]
* 2026-01: [https://www.anthropic.com/news/accelerating-scientific-research How scientists are using Claude to accelerate research and discovery]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]
* 2026-03: [https://www.pnas.org/doi/10.1073/pnas.2533676123 Expert evaluation of LLM world models: A high-Tc superconductivity case study] ([https://research.google/blog/testing-llms-on-superconductivity-research-questions/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct blog])

===Medical===
* 2025-03: [https://www.medrxiv.org/content/10.1101/2025.02.28.25323115v1.full Medical Hallucination in Foundation Models and Their Impact on Healthcare]
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study]
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench]
* 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis]
* 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study]
* 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog])
* 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI]
* 2026-01: [https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(25)02464-X/abstract Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial]

===Translation===
* 2025-01: [https://simonwillison.net/2025/Feb/2/workflow-for-translation/ A professional workflow for translation using LLMs] ([https://news.ycombinator.com/item?id=42897856 based on this])

===Customer service===
* 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work]: Improvements for workers and clients (though also a ceiling to improvement)

===Creativity===
* See also: [[AI creativity]]
* 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance]
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content]
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity]
** 2024-08: Response: [https://www.nature.com/articles/s41562-024-01953-1 ChatGPT decreases idea diversity in brainstorming] ([https://www.nature.com/articles/s41562-025-02173-x.epdf?sharing_token=LA9NyDHj7y5WN8zvb5Qm49RgN0jAjWel9jnR3ZoTv0Nl8PrpXFkjZ93XvmUVBgB9Hlfro5Yo6YELr-pRqbpk3HaZENCvsfV8G1kwtTEj2oW1g87dSVT4BzrfCu3jS_606SLzmoDuDiALChY-MozVM4Pj1b4Vdf-YaIH5p3lfAnM%3D pdf])
** 2025-05: Response: [https://www.nature.com/articles/s41562-025-02195-5 Reply to: ChatGPT decreases idea diversity in brainstorming]
* 2024-08: [https://doi.org/10.1287/orsc.2023.18430 The Crowdless Future? Generative AI and Creative Problem-Solving]
* 2024-10: [https://arxiv.org/abs/2410.03703 Human Creativity in the Age of LLMs]
* 2024-11: <strike>[https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor]
* 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis]

===Equity===
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity]

==AI worse than humans==
* 2025-04: [https://spinup-000d1a-wp-offload-media.s3.amazonaws.com/faculty/wp-content/uploads/sites/27/2025/03/AI-debt-collection-20250331.pdf How Good is AI at Twisting Arms? Experiments in Debt Collection]
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions]
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research]

==AI lowers human capability==
* 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis])
** 2026-02: [https://metr.org/blog/2026-02-24-uplift-update/ We are Changing our Developer Productivity Experiment Design] (AI tools [https://x.com/METR_Evals/status/2026355544668385373?s=20 now] lead to improved performance)
* 2026-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646 Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender]

==Human Perceptions of AI==
* 2023-09: [https://www.nature.com/articles/d41586-023-02980-0 AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.]
* 2024-11: [https://doi.org/10.1016/S2589-7500(24)00202-4 Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey] (Nature commentary: [https://www.nature.com/articles/s41592-024-02369-5 Quest for AI literacy])
* 2025-03: [https://www.arxiv.org/abs/2503.16458 Users Favor LLM-Generated Content -- Until They Know It's AI]

===AI passes Turing Test===
'''Text Dialog'''
* 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test]
* 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?]
* 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test]
* 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests]
* 2025-03: [https://arxiv.org/abs/2503.23674 Large Language Models Pass the Turing Test]
* 2025-04: [https://www.sciencedirect.com/science/article/abs/pii/S0022103117303980 A Minimal Turing Test]

'''Art'''
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably]
* 2025-09: [https://arxiv.org/abs/2509.25601 Echoes of Humanity: Exploring the Perceived Humanness of AI Music]

'''Imagery'''
* 2026-02: [https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjop.70063 Too good to be true: Synthetic AI faces are more average than real faces and super-recognizers know it]
** [https://www.unsw.edu.au/newsroom/news/2026/02/humans-overconfident-telling-AI-faces-real-faces-people-fake People are overconfident about spotting AI faces, study finds]

=Uptake=
* 2023-07: [https://doi.org/10.9734/ajrcos/2023/v16i4392 ChatGPT: Early Adopters, Teething Issues and the Way Forward]
* 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews]
* 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166]
* 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ]
* 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary]
* 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI]
* 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report])
** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
** Spending on genAI is up 130% (most companies plan to invest going forward)
* 2024-12: [https://www.pnas.org/doi/10.1073/pnas.2414972121 The unequal adoption of ChatGPT exacerbates existing inequalities among workers]
** Higher adoption among young and less experienced
** Lower adoption among women and lower-earning workers
* 2025-02: [https://arxiv.org/abs/2502.09747 The Widespread Adoption of Large Language Model-Assisted Writing Across Society]: 10-25% adoption across a range of contexts
* 2025-02: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5078805 Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space]
* 2025-04: [https://andreyfradkin.com/assets/demandforllm.pdf Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming]
* 2025-05: [https://civicscience.com/chatgpt-is-still-leading-the-ai-wars-but-google-gemini-is-gaining-ground/ ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground]
* 2025-05: [https://www.nber.org/papers/w33777 Large Language Models, Small Labor Market Effects]
** Significant uptake, but very little economic impact so far
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877 The Labor Market Effects of Generative Artificial Intelligence]
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence]
* 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI]
* 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation
* 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?]
* 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023]
* 2025-12: Epoch AI: [https://epoch.ai/data/polling Polling on AI Usage]

==Usage By==
* 2026-02: [https://www.nber.org/papers/w34813 The Politics of AI]

==Usage For==
* 2024-12: [https://assets.anthropic.com/m/7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf Clio: A system for privacy-preserving insights into real-world AI use] (Anthropic [https://www.anthropic.com/research/clio Clio])
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup])
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude]
* 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy]
* 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI)

==Hiding Usage==
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias]

=Societal Effects/Transformations=
* 2024-09: [https://arxiv.org/abs/2409.01754 Empirical evidence of Large Language Model's influence on human spoken communication]
* 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data]

=Psychological Impact=
* 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought]
* 2025-11: [https://arxiv.org/abs/2511.15352 People readily follow personal advice from AI but it does not improve their well-being]

==Human Sentiment towards AI==
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence]
* 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China]
* 2025-12: [https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%20Flash%20Poll%20Trust%20and%20Artificial%20Intelligence%20at%20a%20Crossroads%201.pdf 2025 Edelman Trust Barometer]
* 2025-12: [https://navigatorresearch.org/views-of-ai-and-data-centers/ Polling - Views of AI and data centers]
* 2026-03: [https://osf.io/preprints/psyarxiv/5mwre_v9 The Moralization of Artificial Intelligence]

==AI Persuasion of Humans==
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences]
* 2024-08: [https://arxiv.org/abs/2408.04681 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews]
* 2024-04: [https://osf.io/preprints/psyarxiv/h7n8u_v1 Just the facts: How dialogues with AI reduce conspiracy beliefs]
* 2024-09: [https://www.science.org/doi/10.1126/science.adq1814 Durably reducing conspiracy beliefs through dialogues with AI]
* 2025-03: [https://www.pnas.org/doi/10.1073/pnas.2413443122 Scaling language model size yields diminishing returns for single-message political persuasion]
* 2025-04: [https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/edit Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment]
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users]
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders]
* 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI]
* 2025-12: [https://www.science.org/doi/10.1126/science.aea3884 The levers of political persuasion with conversational artificial intelligence]
* 2025-12: [https://www.nature.com/articles/s41586-025-09771-9 Persuading voters using human–artificial intelligence dialogues]

==AI Effects on Human Psychology==
===Human well-being===
* 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots]
* 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]

===Counter loneliness===
* 2023-11: [https://arxiv.org/abs/2311.10599 Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines]
* 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness]
* 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study]
* 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship]

===Human mental abilities (creativity, learning)===
* 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers]
* 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task]

=Simulate Humans=
* See also: [[Human brain]]

==Sociology==
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods]
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia]
* 2024-02: [https://arxiv.org/abs/2402.12620 Are Large Language Models (LLMs) Good Social Predictors?]
* 2024-04: [https://arxiv.org/abs/2404.11794 Automated Social Science: Language Models as Scientist and Subjects]
* 2024-07: [https://academic.oup.com/pnasnexus/article/3/7/pgae245/7712371 Perils and opportunities in using large language models in psychological research]
* 2024-08: [https://samim.io/dl/Predicting%20results%20of%20social%20science%20experiments%20using%20large%20language%20models.pdf Predicting Results of Social Science Experiments Using Large Language Models]
* 2024-10: [https://www.pnas.org/doi/10.1073/pnas.2407639121 Large Language Models based on historical text could offer informative tools for behavioral science]
* 2025-04: [https://arxiv.org/abs/2504.02234 LLM Social Simulations Are a Promising Research Method]
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents]
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
* 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code])
* 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra]
* 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents]
* 2025-12: [https://arxiv.org/abs/2506.06958 Simulating Society Requires Simulating Thought]

==Theory of Mind==
* 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns]
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]

==Humanlike Vibes==
* 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?]
* 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings]

==Skeptical==
* 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology]
* 2026-02: [https://arxiv.org/abs/2601.16130 Replicating Human Motivated Reasoning Studies with LLMs]

=See Also=
* [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research]
* [[AI]]
** [[AI predictions]]

Science Agents

2026-03-16T15:56:05Z

KevinYager: /* Commercial */

=AI Use-cases for Science=

==Literature==
* [https://www.alphaxiv.org/explore alphaXiv | Explore]: Understand arXiv papers

===LLM extract data from papers===
* 2024-14: [https://pubs.rsc.org/en/content/articlelanding/2025/cs/d4cs00913d From text to insight: large language models for chemical data extraction]

===AI finding links in literature===
* 2019-07: [https://doi.org/10.1038/s41586-019-1335-8 Unsupervised word embeddings capture latent knowledge from materials science literature]
* 2024-11: [https://doi.org/10.1038/s41562-024-02046-9 Large language models surpass human experts in predicting neuroscience results]

===(Pre) Generate Articles===
* 2022-12: [https://aclanthology.org/2022.emnlp-main.296/ Re3: Generating Longer Stories With Recursive Reprompting and Revision]
* 2023-03: English essays: [https://journal.unnes.ac.id/sju/index.php/elt/article/view/64069 Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay]
* 2023-01: Journalism: [https://journals.sagepub.com/doi/10.1177/10776958221149577 Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education]
* 2023-07: Science writing: [https://www.rbmojournal.com/article/S1472-6483(23)00219-5/fulltext Artificial intelligence in scientific writing: a friend or a foe?]
* 2024-02: Wikipedia style: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models]
* 2024-02: [https://arxiv.org/abs/2408.07055 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs] ([https://github.com/THUDM/LongWriter code])
* 2024-08: Scientific papers: [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
* 2025-03: [https://arxiv.org/abs/2503.19065 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation]
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]

==Explanation==
* 2025-02: [https://tiger-ai-lab.github.io/TheoremExplainAgent/ TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding] ([https://arxiv.org/abs/2502.19400 preprint])
* 2025-04: [https://arxiv.org/abs/2504.02822 Do Two AI Scientists Agree?]

==Autonomous Ideation==
* 2024-04: [https://arxiv.org/abs/2404.07738 ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models]
* 2024-09: [https://arxiv.org/abs/2409.14202 Mining Causality: AI-Assisted Search for Instrumental Variables]
* 2024-12: [https://arxiv.org/abs/2412.07977 Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging Events]
* 2024-12: [https://arxiv.org/abs/2412.14141 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research]
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://arxiv.org/abs/2502.13025 Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks]
* 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models]
* 2025-06: [https://arxiv.org/abs/2506.20803 The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas]

==Adapting LLMs to Science==
* 2023-06: [https://doi.org/10.1039/D3DD00112A Domain-specific chatbots for science using embeddings]
* 2024-10: [https://arxiv.org/abs/2411.00027 Personalization of Large Language Models: A Survey]
* 2024-11: [https://arxiv.org/abs/2411.00412 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation]

==AI/LLM Control of Scientific Instruments/Facilities==
* 2023-12: [https://www.nature.com/articles/s41524-024-01423-2 Opportunities for retrieval and tool augmented large language models in scientific facilities]
* 2023-12: [https://arxiv.org/abs/2312.17180 Virtual Scientific Companion for Synchrotron Beamlines: A Prototype]
* 2023-12: [https://www.nature.com/articles/s41586-023-06792-0 Autonomous chemical research with large language models]
* 2024-01: [https://iopscience.iop.org/article/10.1088/2632-2153/ad52e9 Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design]
* 2024-06: [https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00143e From Text to Test: AI-Generated Control Software for Materials Science Instruments]
* 2024-12: [https://arxiv.org/abs/2412.18161 VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities]
* 2025-01: [https://www.science.org/doi/10.1126/sciadv.adr4173 Large language models for human-machine collaborative particle accelerator tuning through natural language]
* 2025-04: [https://openreview.net/forum?id=iA9UN1dEgJ Operating Robotic Laboratories with Large Language Models and Teachable Agents]

==AI/ML Methods tailored to Science==
===Science Foundation Models===
* 2025-08: [https://arxiv.org/abs/2508.15763 Intern-S1: A Scientific Multimodal Foundation Model]
* 2025-11: [https://pubs.aip.org/aip/jcp/article/163/18/184110/3372267/A-foundation-model-for-atomistic-materials A foundation model for atomistic materials chemistry]
* 2025-11: [https://arxiv.org/abs/2511.15684 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics]
* 2026-01: [https://www.science.org/doi/10.1126/science.ads9530 Deep contrastive learning enables genome-wide virtual screening]

===Regression (Data Fitting)===
* 2024-06: [https://arxiv.org/abs/2406.14546 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data]: training on (x,y) pairs enables inferring underlying function (define it in code, invert it, compose it)
* 2024-12: [https://arxiv.org/abs/2402.14547 OmniPred: Language Models as Universal Regressors]

===Tabular Classification/Regression===
* 2025-01: [https://www.nature.com/articles/s41586-024-08328-6 Accurate predictions on small data with a tabular foundation model] ([https://github.com/PriorLabs/TabPFN code])

===Symbolic Regression===
* 2024-09: [https://arxiv.org/abs/2409.09359 Symbolic Regression with a Learned Concept Library]

===Literature Discovery===
* [https://www.futurehouse.org/ FutureHouse]
** [https://hasanyone.com/ hasanyone]
** [https://github.com/Future-House/paper-qa PaperQA2]
* [https://lumina.sh/ Lumina]
* [https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama Automated-AI-Web-Researcher-Ollama]
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
* 2026-02: [https://www.nature.com/articles/s41586-025-10072-4 Synthesizing scientific literature with retrieval-augmented language models] ([https://allenai.org/blog/openscholar-nature blog])

===Commercial===
* [https://sakana.ai/ai-scientist/ Sakana AI]
* [https://www.cusp.ai/ Cusp AI]: Materials/AI
* [https://www.lila.ai/ Lila AI]: Life sciences
* [https://www.radical-ai.com/ Radical AI]: Material simulation/design
* [https://www.autoscience.ai/ Autoscience] ([https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research Carl])
* [https://periodic.com/ Periodic Labs]
* [https://edisonscientific.com/articles/announcing-edison-scientific Edison Scientific] (drug discovery, spinoff from [https://www.futurehouse.org/ FutureHouse])
* 2026-03: Mirendil Inc.: advanced models to speed up R&D in scientific domains, especially biology and materials science

====Bio====
* [https://www.bioptimus.com/ Bioptimus]
* [https://www.evolutionaryscale.ai/ EvolutionaryScale]

==AI/ML Methods in Science==
* 2025-07: [https://www.mdpi.com/2313-433X/11/8/252 Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures]

===Imaging===
* 2025-05: [https://arxiv.org/abs/2505.08176 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations] (blog: [https://phzwart.github.io/behindthenoise/ Behind the Noise])

===Materials===
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-03: [https://arxiv.org/abs/2503.03965 All-atom Diffusion Transformers: Unified generative modelling of molecules and materials]
* 2022-11: [https://arxiv.org/abs/2511.19730 Training-Free Active Learning Framework in Materials Science with Large Language Models]

===Chemistry===
* 2025-01: [https://www.nature.com/articles/s41578-025-00772-8 Large language models for reticular chemistry]
* 2025-02: [https://www.nature.com/articles/s42256-025-00982-3 Image-based generation for molecule design with SketchMol]
* 2025-02: [https://www.nature.com/articles/s42256-025-00994-z Large language models for scientific discovery in molecular property prediction]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-04: [https://arxiv.org/abs/2504.08051 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design]
* 2025-07: [https://arxiv.org/abs/2507.07456 General purpose models for the chemical sciences]
* 2025-11: [https://chemrxiv.org/engage/chemrxiv/article-details/690357d9a482cba122e366b6 ChemTorch: A Deep Learning Framework for Benchmarking and Developing Chemical Reaction Property Prediction Models]

===Biology===
* 2018: [https://alphafold.ebi.ac.uk/ AlphaFold]
* 2021-07: [https://www.nature.com/articles/s41586-021-03819-2 AlphaFold 2]
* 2024-05: [https://www.nature.com/articles/s41586-024-07487-w AlphaFold 3]
* 2023-03: [https://www.science.org/doi/10.1126/science.ade2574 Evolutionary-scale prediction of atomic-level protein structure with a language model] ([https://esmatlas.com/resources?action=fold ESMFold])
* 2023-11: [https://www.nature.com/articles/s41586-023-06728-8 Illuminating protein space with a programmable generative model]
* 2024-11: [https://www.science.org/doi/10.1126/science.ado9336 Sequence modeling and design from molecular to genome scale with Evo] (Evo)
* 2025-01: [https://www.nature.com/articles/s41586-024-08435-4 Targeting protein–ligand neosurfaces with a generalizable deep learning tool] (Chroma)
* 2025-01: [https://www.science.org/doi/10.1126/science.ads0018 Simulating 500 million years of evolution with a language model] ([https://github.com/evolutionaryscale/esm ESM] 3 model)
* 2025-02: [https://arcinstitute.org/manuscripts/Evo2 Genome modeling and design across all domains of life with Evo 2]
* 2025-02: [https://www.microsoft.com/en-us/research/blog/exploring-the-structural-changes-driving-protein-function-with-bioemu-1/ Exploring the structural changes driving protein function with BioEmu-1]
* 2025-02: [https://arxiv.org/pdf/2502.18449 Protein Large Language Models: A Comprehensive Survey]
* [https://x.com/vant_ai/status/1903070297991110657 2025-03]: [https://www.vant.ai/ Vant AI] [https://www.vant.ai/neo-1 Neo-1]: atomistic foundation model (small molecules, proteins, etc.)
* 2025-03: [https://arxiv.org/abs/2503.16351 Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences]
* 2025-08: RosettaFold 3: [https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2 Accelerating Biomolecular Modeling with AtomWorks and RF3]
* 2025-09: [https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1 Generative design of novel bacteriophages with genome language models]
* 2025-10: [https://www.science.org/doi/10.1126/science.adu8578 Strengthening nucleic acid biosecurity screening against generative protein design tools]
* 2026-01: [https://www.nature.com/articles/s41586-025-10014-0 Advancing regulatory variant effect prediction with AlphaGenome]

===Medicine===
See: [[AI_Agents#Medicine]]

===Successes===
* 2025-02: [https://arxiv.org/abs/2502.11270 Site-Decorated Model for Unconventional Frustrated Magnets: Ultranarrow Phase Crossover and Spin Reversal Transition]

==AI/ML Methods co-opted for Science==
===Mechanistic Interpretability===
Train large model on science data. Then apply [[AI_understanding#Mechanistic_Interpretability|mechanistic interpretability]] (e.g. sparse autoencoders, SAE) to the feature/activation space.
* Mechanistic interpretability for protein language models ([https://interprot.com/ visualizer], [https://github.com/etowahadams/interprot/tree/main code], [https://huggingface.co/liambai/InterProt-ESM2-SAEs SAE])
* [https://www.markov.bio/ Markov Bio]: [https://www.markov.bio/research/mech-interp-path-to-e2e-biology Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology] ([https://x.com/adamlewisgreen/status/1853206279499751531 quick description], [https://markovbio.github.io/biomedical-progress/ background info on recent bio progress])
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
* 2024-10: [https://arxiv.org/abs/2410.03334 An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation]
* 2024-12: [https://www.arxiv.org/abs/2412.16247 Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models]
* 2024-12: [https://arxiv.org/abs/2412.12101 InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders]
* 2025-01: [https://arxiv.org/abs/2501.00089 Insights on Galaxy Evolution from Interpretable Sparse Feature Networks]
* 2025-02: [https://www.biorxiv.org/content/10.1101/2025.02.06.636901v1 From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models]
* 2025-02: [https://www.goodfire.ai/blog/interpreting-evo-2 Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]

===Uncertainty===
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
* 2024-10: [https://arxiv.org/abs/2410.09724 Taming Overconfidence in LLMs: Reward Calibration in RLHF]

=Science Benchmarks=
* 2024-07: [https://arxiv.org/abs/2407.13168 SciCode: A Research Coding Benchmark Curated by Scientists] ([http://scicode-bench.github.io/ project])
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
* 2025-01: [https://agi.safe.ai/ Humanity's Last Exam]
* [https://github.com/OSU-NLP-Group/ScienceAgentBench ScienceAgentBench]
* 2025-02: [https://arxiv.org/abs/2502.20309 EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants]
* 2025-03: [https://huggingface.co/datasets/futurehouse/BixBench BixBench]: Novel hypotheses (accept/reject)
* 2025-04: [https://research.google/blog/evaluating-progress-of-llms-on-scientific-problem-solving/ Google: Evaluating progress of LLMs on scientific problem-solving]
** 2025-03: [https://arxiv.org/abs/2503.13517 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning]
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]

=Science Agents=
==Reviews==
* 2024-10: [https://www.cell.com/cell/fulltext/S0092-8674(24)01070-5?target=_blank Empowering biomedical discovery with AI agents]
* 2025-01: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc03921a A review of large language models and autonomous agents in chemistry] ([https://github.com/ur-whitelab/LLMs-in-science github])
* 2025-07: [https://arxiv.org/abs/2507.01903 AI4Research: A Survey of Artificial Intelligence for Scientific Research]
* 2025-08: [https://arxiv.org/abs/2508.14111 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery]

==Challenges==
* 2026-01: [https://arxiv.org/abs/2601.03315 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts]

==Specific==
* 2024-01-13: [https://arxiv.org/abs/2401.06949 ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization] ([https://www.youtube.com/watch?v=N6qMMwJ8hKQ video])
* 2024-06-19: [https://arxiv.org/abs/2406.13163 LLMatDesign: Autonomous Materials Discovery with Large Language Models]
* 2024-08-12: [https://sakana.ai/ Sakana AI]: [https://sakana.ai/ai-scientist/ AI Scientist]; [https://arxiv.org/abs/2408.06292 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery] ([https://github.com/SakanaAI/AI-Scientist code])
* 2024-09-09: [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning] ([https://github.com/lamm-mit/SciAgentsDiscovery code])
* 2024-09-11: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
* 2024-10-17: [https://arxiv.org/abs/2410.13768 Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems]
* 2024-10-28: [https://arxiv.org/abs/2410.20976 Large Language Model-Guided Prediction Toward Quantum Materials Synthesis]
* 2024-12-06: [https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation] (writeup: [https://www.nature.com/articles/d41586-024-01684-3 Virtual lab powered by ‘AI scientists’ super-charges biomedical research: Could human–AI collaborations be the future of interdisciplinary studies?])
* 2024-12-30: [https://arxiv.org/abs/2412.21154 Aviary: training language agents on challenging scientific tasks]
* See also: [[AI_Agents#Deep_Research|AI Agents > Deep Research]]
* 2025-04-08: Sakana: [https://pub.sakana.ai/ai-scientist-v2/paper/paper.pdf The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search] ([https://github.com/SakanaAI/AI-Scientist-v2 code])
* 2025-07: [https://arxiv.org/abs/2507.14267 DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation]
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery]
* 2025-11: [https://arxiv.org/abs/2511.08151 SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning]
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]

==Skills==
* 2026-03: [https://github.com/K-Dense-AI/claude-scientific-skills/tree/main?tab=readme-ov-file#use-cases Claude Scientific Skills] (list)

==Science Multi-Agent Setups==
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
* 2025-04: [https://www.nature.com/articles/s41551-025-01363-2 Coordinated AI agents for advancing healthcare] ([https://www.nature.com/articles/s41551-025-01363-2.epdf?sharing_token=CIYP3J8LZE4BX31fV3WxUdRgN0jAjWel9jnR3ZoTv0O9iD-yhgqzRaz_7VASayWRePPhWDD2xFyfuOpSXbdPaOtt7oH4nfXo7telALzNwY3V1p9SxoqBEJy2OuaJ_cA35-CYQC1XgjCNTZUw46dh1KX-Dj8e7-1Vk_RlZKFLrc8%3D pdf])

=AI Science Systems=
* 2025-01: [https://arxiv.org/abs/2501.03916 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback]
* 2025-01: [https://arxiv.org/abs/2501.13299 Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents]
* 2025-02: [https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf Towards an AI co-scientist] (Google blog post: [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ Accelerating scientific breakthroughs with an AI co-scientist])
* 2025-06: [https://zenodo.org/records/15693353 The Discovery Engine]
** 2025-07: [https://arxiv.org/abs/2507.00964 Benchmarking the Discovery Engine] ([https://www.leap-labs.com/blog/how-we-replicated-five-peer-reviewed-papers-in-five-hours blog])
* 2025-07: [https://www.preprints.org/manuscript/202507.1951/v1 Autonomous Scientific Discovery Through Hierarchical AI Scientist Systems]
* 2025-12: [https://arxiv.org/abs/2512.16969 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows]
* 2026-01: [https://www.nature.com/articles/s43588-025-00906-6 SciSciGPT: advancing human–AI collaboration in the science of science]
* 2026-02: [https://allenai.org/papers/autodiscovery AUTODISCOVERY: Open-ended Scientific Discovery via Bayesian Surprise] (Allen AI (Ai2) AstraLabs, [https://allenai.org/blog/autodiscovery blog], [https://autodiscovery.allen.ai/runs tools])

===Inorganic Materials Discovery===
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]
* 2023-11: [https://doi.org/10.1038/s41586-023-06734-w An autonomous laboratory for the accelerated synthesis of novel materials]
* 2024-09: [https://arxiv.org/abs/2409.00135 HoneyComb: A Flexible LLM-Based Agent System for Materials Science]
* 2024-10: [https://arxiv.org/abs/2410.12771 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models] ([https://github.com/FAIR-Chem/fairchem code], [https://huggingface.co/datasets/fairchem/OMAT24 datasets], [https://huggingface.co/fairchem/OMAT24 checkpoints], [https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-sona/ blogpost])
* 2025-01: [https://www.nature.com/articles/s41586-025-08628-5 A generative model for inorganic materials design]
* 2025-04: [https://arxiv.org/abs/2504.14110 System of Agentic AI for the Discovery of Metal-Organic Frameworks]
* 2025-05: [https://arxiv.org/abs/2505.08762 The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models]

===Materials Characterization===
* 2025-08: [https://arxiv.org/abs/2508.06569 Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop]

===Chemistry===
* 2023-12: [https://doi.org/10.1038/s41586-023-06792-0 Autonomous chemical research with large language models] (Coscientist)
* 2024-09: [https://www.pnnl.gov/main/publications/external/technical_reports/PNNL-36692.pdf PNNL ChemAIst V0.2]
* 2024-11: [https://www.nature.com/articles/s41467-024-54457-x An automatic end-to-end chemical synthesis development platform powered by large language models]
* 2025-06: [https://paper.ether0.ai/ Training a Scientific Reasoning Model for Chemistry]
* 2025-06: [https://arxiv.org/abs/2506.06363 ChemGraph: An Agentic Framework for Computational Chemistry Workflows] ([https://github.com/argonne-lcf/ChemGraph code])

===Bio===
* 2025-07: [https://arxiv.org/abs/2507.01485 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments]

===Physics===
* 2025-12: [https://arxiv.org/abs/2512.19799 PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research]

==LLMs Optimized for Science==
* 2022-11: [https://arxiv.org/abs/2211.09085 Galactica: A Large Language Model for Science]
* 2024-12: [https://www.nature.com/articles/s41467-024-54639-7 Crystal structure generation with autoregressive large language modeling]
* 2025-02: [https://arxiv.org/abs/2502.13107 MatterChat: A Multi-Modal LLM for Material Science]
* 2025-03: [https://arxiv.org/abs/2503.17604 OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery]
* 2025-03: Google [https://huggingface.co/collections/google/txgemma-release-67dd92e931c857d15e4d1e87 TxGemma] (2B, 9B, 27B): [https://developers.googleblog.com/en/introducing-txgemma-open-models-improving-therapeutics-development/ drug development]

=Impact of AI in Science=
* 2024-11: <strike>[https://aidantr.github.io/files/AI_innovation.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]</strike>
** 2025-05: Retraction: [https://economics.mit.edu/news/assuring-accurate-research-record Assuring an accurate research record]
* 2025-02: [https://arxiv.org/abs/2502.05151 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation]
* 2026-02: [https://arxiv.org/abs/2602.03837 Accelerating Scientific Research with Gemini: Case Studies and Common Techniques]

=Related Tools=
==Literature Search==
* [https://www.perplexity.ai/ Perplexity]
* [https://www.arxival.xyz/ ArXival]

==Data Visualization==
* 2024-10: Microsoft [https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/ Data Formulator: Create Rich Visualization with AI iteratively] ([https://www.microsoft.com/en-us/research/video/data-formulator-create-rich-visualization-with-ai-iteratively/ video], [https://github.com/microsoft/data-formulator code])
* [https://julius.ai/ Julius AI]: Analyze your data with computational AI

==Generative==
* 2025-03: [https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289 StarVector] 1B, 8B: text or image to SVG

==Chemistry==
* 2025-03: [https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00834-z Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices] ([https://rxn-insight.readthedocs.io/en/latest/ docs])

=Science Datasets=
* [https://datasetsearch.research.google.com/ Google Dataset Search]
* [https://github.com/blaiszik/awesome-matchem-datasets/ Awesome Materials & Chemistry Datasets]
* NIST [https://jarvis.nist.gov/ Jarvis] (simulations)

=Genuine Discoveries=
* 2025-11: [https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf Early science acceleration experiments with GPT-5]
* 2025-12: [https://andymasley.substack.com/p/ai-can-obviously-create-new-knowledge AI can obviously create new knowledge - But maybe not new concepts]
==Math==
* 2023-07: [https://www.nature.com/articles/s41586-023-06004-9?utm_source=chatgpt.com Faster sorting algorithms discovered using deep reinforcement learning]
* 2025-06: [https://arxiv.org/abs/2506.13131 AlphaEvolve: A coding agent for scientific and algorithmic discovery]
* 2025-11: [https://arxiv.org/abs/2511.02864 Mathematical exploration and discovery at scale]
* 2025-11: [https://www.nature.com/articles/s41586-025-09833-y Olympiad-level formal mathematical reasoning with reinforcement learning]
* 2025-12: [https://arxiv.org/abs/2512.14575 Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI]
* [https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems AI Solving Erdős Problems]:
** 2026-01: [https://www.erdosproblems.com/728 Erdős Problem #728] and [https://www.erdosproblems.com/729 #729] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/397 Erdős Problem #397] [https://x.com/neelsomani/status/2010215162146607128?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/205 Erdős Problem #205] solved by Aristotle using ChatGPT 5.2 Pro
** 2026-01: [https://www.erdosproblems.com/forum/thread/281 Erdős Problem #281] [https://x.com/neelsomani/status/2012695714187325745?s=20 solved] by [https://neelsomani.com/ Neel Somani] using ChatGPT 5.2 Pro
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.21442 Irrationality of rapidly converging series: a problem of Erdős and Graham]
*** [https://www.erdosproblems.com/1051 Erdős Problem #1051] [https://x.com/slow_developer/status/2018321002623901885?s=20 solved] by Google DeepMind Aletheia agent
** 2026-01: Google DeepMind: [https://arxiv.org/abs/2601.22401 Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems]
*** Attempted 700 problems, solved 13 open Erdős problems: 5 novel autonomous solutions, 8 through existing literature.
** 2026-02: [https://www.erdosproblems.com/846 Erdős Problem #846]
*** [https://x.com/roydanroy/status/2026804567178953048?s=20 Google DeepMind]
*** [https://x.com/mehtaab_sawhney/status/2026716221933343147?s=20 Using OpenAI internal model] (paper: [https://cdn.openai.com/infinite-sets/main_single_clean3.pdf On infinite sets with no 3 on a line])
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
* 2026-03: Donald Knuth: [https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf A problem in Directed Hamiltonian Cycles] solved by Filip Stappers using Claude Opus 4.6
* 2026-03: Google DeepMind: [https://arxiv.org/abs/2603.09172 Reinforced Generation of Combinatorial Structures: Ramsey Numbers]

==Physics assistance==
* 2025-03: [https://arxiv.org/abs/2503.23758 Exact solution of the frustrated Potts model with next-nearest-neighbor interactions in one dimension via AI bootstrapping] ([https://www.bnl.gov/staff/wyin Weiguo Yin])
* 2025-12: [https://www.sciencedirect.com/science/article/pii/S0370269325008111 Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis]
** [https://x.com/hsu_steve/status/1996034522308026435?s=20 Steve Hsu], [https://drive.google.com/file/d/16sxJuwsHoi-fvTFbri9Bu8B9bqA6lr1H/view Theoretical Physics with Generative AI]
* 2026-02: [https://arxiv.org/abs/2602.12176 Single-minus gluon tree amplitudes are nonzero] (GPT-5.2, [https://openai.com/index/new-result-theoretical-physics/ blog])

==Literature exploration==
* 2025-11: [https://arxiv.org/abs/2511.02824 Kosmos: An AI Scientist for Autonomous Discovery] ([https://edisonscientific.com/ Edison])
** [https://platform.edisonscientific.com/kosmos/c4bdef64-5e9b-43b9-a365-592dd1ed7587 Nucleotide metabolism in hypothermia]
** [https://platform.edisonscientific.com/kosmos/1fdbf827-be65-4d97-9b66-bf0da600091a Determinant of perovskite solar-cell failure]
** [https://platform.edisonscientific.com/kosmos/4fb3fbdb-c449-4064-9aa6-ff4ec53131d8 Log-normal connectivity in neural networks]
** [https://platform.edisonscientific.com/kosmos/c6849232-5858-4634-adf5-83780afbe3db SOD2 as driver of myocardial fibrosis]
** [https://platform.edisonscientific.com/kosmos/abac07da-a6bb-458f-b0ba-ef08f1be617e Protective variant of SSR1 in type 2 diabetes]
** [https://platform.edisonscientific.com/kosmos/a770052b-2334-4bbe-b086-5149e0f03d99 Temporal ordering in Alzheimer’s disease]
** [https://platform.edisonscientific.com/kosmos/28c427d2-be31-48b5-b272-28d5a1e3ea5c Mechanism of neuron vulnerability in aging]
==Bio design==
* 2023-07: [https://www.nature.com/articles/s41586-023-06415-8 De novo design of protein structure and function with RFdiffusion]
* 2025-11: [https://www.nature.com/articles/s41586-025-09721-5 Atomically accurate de novo design of antibodies with RFdiffusion]
* 2025-11: [https://deepmind.google/blog/alphafold-five-years-of-impact/ AlphaFold: Five years of impact]
* 2026-01: [https://www.goodfire.ai/research/interpretability-for-alzheimers-detection# Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers]
==Material Discovery==
* 2023-11: [https://doi.org/10.1038/s41586-023-06735-9 Scaling deep learning for materials discovery]

=See Also=
* [[AI agents]]
* [https://nanobot.chat/ Nanobot.chat]: Intelligent AI for the labnetwork @ mtl.mit.edu forum