1 Comment
User's avatar
JP's avatar

Coming to this late but it reads well with hindsight. The landscape has shifted quite a bit since you wrote this. Cerebras, which you mentioned briefly as a specialised accelerator, just landed a $10 billion inference deal with OpenAI and shipped the first production model on non-NVIDIA silicon. 1,000+ tokens per second for code generation. The memory bandwidth angle turned out to be the unlock. I covered the Cerebras architecture and why it matters for developer workflows here: https://reading.sh/chatgpt-and-codex-are-about-to-get-helluva-lot-faster-51ad25a7eed0