Difference between revisions of "AI compute"
KevinYager (talk | contribs) (→Acceleration Hardware) |
KevinYager (talk | contribs) (→Energy Use) |
||
(24 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | =Cloud GPU= | |
− | |||
* [https://lambdalabs.com/ Lambda] | * [https://lambdalabs.com/ Lambda] | ||
* [https://vast.ai/ Vast AI] | * [https://vast.ai/ Vast AI] | ||
Line 7: | Line 6: | ||
* [https://hpc-ai.com/ HPC-AI] | * [https://hpc-ai.com/ HPC-AI] | ||
− | ==Acceleration Hardware | + | =Cloud Training Compute= |
+ | * [https://nebius.ai/ Nebius AI] | ||
+ | * [https://glaive.ai/ Glaive AI] | ||
+ | |||
+ | =Cloud LLM Routers & Inference Providers= | ||
+ | * [https://openrouter.ai/ OpenRouter] (open and closed models, no Enterprise tier) | ||
+ | * [https://www.litellm.ai/ LiteLLM] (closed models, Enterprise tier) | ||
+ | * [https://centml.ai/ Cent ML] (open models, Enterprise tier) | ||
+ | * [https://fireworks.ai/ Fireworks AI] (open models, Enterprise tier) | ||
+ | * [https://abacus.ai/ Abacus AI] (open and closed models, Enterprise tier) | ||
+ | * [https://portkey.ai/ Portkey] (open? and closed models, Enterprise tier) | ||
+ | * [https://www.together.ai/ Together AI] (open models, Enterprise tier) | ||
+ | * [https://hyperbolic.xyz/ Hyperbolic AI] (open models, Enterprise tier) | ||
+ | * Huggingface [https://huggingface.co/blog/inference-providers Inference Providers Hub] | ||
+ | |||
+ | ==Multi-model with Model Selection== | ||
+ | * [https://www.notdiamond.ai/ Not Diamond ¬⋄] | ||
+ | * [https://withmartian.com/ Martian] | ||
+ | |||
+ | ==Multi-model Web Chat Interfaces== | ||
+ | * [https://simtheory.ai/ SimTheory] | ||
+ | * [https://abacus.ai/ Abacus AI] [https://chatllm.abacus.ai/ ChatLLM] | ||
+ | * [https://poe.com/about Poe] | ||
+ | * [https://gab.ai/ Gab AI] | ||
+ | * [https://www.vectal.ai/login Vectal] ? | ||
+ | * [https://www.blackbox.ai/ BlackboxAI] | ||
+ | |||
+ | ==Multi-model Web Playground Interfaces== | ||
+ | * [https://www.together.ai/ Together AI] | ||
+ | * [https://hyperbolic.xyz/ Hyperbolic AI] | ||
+ | |||
+ | =Local Router= | ||
+ | * [https://ollama.com/ Ollama] | ||
+ | * [https://github.com/mudler/LocalAI LocalAI] | ||
+ | * [https://github.com/AK391/ai-gradio ai-gradio]: unified model interface (based on [https://www.gradio.app/ gradio]) | ||
+ | |||
+ | =Acceleration Hardware= | ||
* [https://www.nvidia.com/ Nvidia] GPUs | * [https://www.nvidia.com/ Nvidia] GPUs | ||
* Google [https://en.wikipedia.org/wiki/Tensor_Processing_Unit TPU] | * Google [https://en.wikipedia.org/wiki/Tensor_Processing_Unit TPU] | ||
− | * [https://www.etched.com/ Etched] | + | * [https://www.etched.com/ Etched]: Transformer ASICs |
* [https://cerebras.ai/ Cerebras] | * [https://cerebras.ai/ Cerebras] | ||
* [https://www.untether.ai/ Untether AI] | * [https://www.untether.ai/ Untether AI] | ||
* [https://www.graphcore.ai/ Graphcore] | * [https://www.graphcore.ai/ Graphcore] | ||
+ | * [https://sambanova.ai/ SambaNova Systems] | ||
* [https://groq.com/ Groq] | * [https://groq.com/ Groq] | ||
* Tesla [https://en.wikipedia.org/wiki/Tesla_Dojo Dojo] | * Tesla [https://en.wikipedia.org/wiki/Tesla_Dojo Dojo] | ||
+ | * [https://deepsilicon.com/ Deep Silicon]: Combined hardware/software solution for accelerated AI ([https://x.com/sdianahu/status/1833186687369023550 e.g.] ternary math) | ||
+ | |||
+ | =Energy Use= | ||
+ | * 2021-04: [https://arxiv.org/abs/2104.10350 Carbon Emissions and Large Neural Network Training] | ||
+ | * 2023-10: [https://arxiv.org/abs/2310.03003 From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference] | ||
+ | * 2024-01: [https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08-952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf Electricity 2024: Analysis and forecast to 2026] | ||
+ | * 2024-02: [https://www.nature.com/articles/s41598-024-54271-x The carbon emissions of writing and illustrating are lower for AI than for humans] | ||
+ | * 2025-04: [https://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about Why using ChatGPT is not bad for the environment - a cheat sheet] | ||
+ | ** A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; [https://docs.google.com/document/d/1pDdpPq3MyPdEAoTkho9YABZ0NBEhBH2v4EA98fm3pXQ/edit?usp=sharing examples of 3 Wh energy usage]) | ||
+ | ** Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text. |
Latest revision as of 14:35, 20 June 2025
Contents
Cloud GPU
Cloud Training Compute
Cloud LLM Routers & Inference Providers
- OpenRouter (open and closed models, no Enterprise tier)
- LiteLLM (closed models, Enterprise tier)
- Cent ML (open models, Enterprise tier)
- Fireworks AI (open models, Enterprise tier)
- Abacus AI (open and closed models, Enterprise tier)
- Portkey (open? and closed models, Enterprise tier)
- Together AI (open models, Enterprise tier)
- Hyperbolic AI (open models, Enterprise tier)
- Huggingface Inference Providers Hub
Multi-model with Model Selection
Multi-model Web Chat Interfaces
Multi-model Web Playground Interfaces
Local Router
Acceleration Hardware
- Nvidia GPUs
- Google TPU
- Etched: Transformer ASICs
- Cerebras
- Untether AI
- Graphcore
- SambaNova Systems
- Groq
- Tesla Dojo
- Deep Silicon: Combined hardware/software solution for accelerated AI (e.g. ternary math)
Energy Use
- 2021-04: Carbon Emissions and Large Neural Network Training
- 2023-10: From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
- 2024-01: Electricity 2024: Analysis and forecast to 2026
- 2024-02: The carbon emissions of writing and illustrating are lower for AI than for humans
- 2025-04: Why using ChatGPT is not bad for the environment - a cheat sheet
- A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; examples of 3 Wh energy usage)
- Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.