If you were wondering if assembly language optimization is still relevant, it was partially responsible for NVDA's stock price dropping by 17% today:
"DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language."
=> More informations about this toot | More toots from sehugg@infosec.exchange
text/gemini
This content has been proxied by September (3851b).