LLM Inference Hardware: Emerging from…

Jan 11, 2024

Subscribe • Previous Issues

1 Comment

Coming to this late but it reads well with hindsight. The landscape has shifted quite a bit since you wrote this. Cerebras, which you mentioned briefly as a specialised accelerator, just landed a $10 billion inference deal with OpenAI and shipped the first production model on non-NVIDIA silicon. 1,000+ tokens per second for code generation. The memory bandwidth angle turned out to be the unlock. I covered the Cerebras architecture and why it matters for developer workflows here: https://reading.sh/chatgpt-and-codex-are-about-to-get-helluva-lot-faster-51ad25a7eed0

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts