The older I get, the more it seems that the rate at which we mint new programmers has sheered away from the rate at which we inspire people to want to learn about the computers they use.
=> More informations about this toot | More toots from slightlyoff@toot.cafe
Like, reading the deepseek paper, my overwhelming thought is "so wait, the secret sauce was...profiling the workload? Then deciding to program to the available hardware?"
I stare at the results of people carelessly composing UI systems without the faintest concern for how they will work in practice, but somehow imagined that wasn't how it's going in the rest of the industry. Woof.
=> More informations about this toot | More toots from slightlyoff@toot.cafe
Reading up on some interesting memory allocator work this AM, I was struck by how there only seem to be like a couple of dozen[1] people in industry that really understand this stuff. And you see the same thing in UI; only a small cohort actually at the top of the game in really making the system sing. And these people are always undervalued.
[1]: this is surely a gross underestimate, but it troubles me that I can't tell by how much
=> More informations about this toot | More toots from slightlyoff@toot.cafe
(and no, none of those people spend their time in React if they have any say in the matter. The React ecosystem is a high-performer dead zone)
=> More informations about this toot | More toots from slightlyoff@toot.cafe
For anyone following, this is the section that really stood out:
https://arxiv.org/html/2412.19437v1#:~:text=3.2.2,to%2DAll%20Communication
...which builds on some really cool profiling and scheduling work:
https://arxiv.org/abs/2401.10241
=> More informations about this toot | More toots from slightlyoff@toot.cafe
@slightlyoff This is a super interesting insight, I had no idea. I only skimmed the papers but it does seem like there is some careful low-level study and sensible algorithms that had been neglected in the previous literature, which I had no idea about. Thanks for putting that on my radar!
=> More informations about this toot | More toots from hlabrande@digipres.club
@slightlyoff Ben Thompson over at Stratchery pointed out, "DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s." https://stratechery.com/2025/deepseek-faq/
=> More informations about this toot | More toots from cratermoon@zirk.us
@cratermoon It's not "insane"; what do you think JS engines do all day?
This is the job when you have a stable workload with understandable properties and there's money riding on efficiency. PTX isn't voodoo; there's an LLVM backend! This stuff only reads as exotic when your expectations are set at "failure is fine".
=> More informations about this toot | More toots from slightlyoff@toot.cafe
@cratermoon Read the rest of the paper on the tuning. It's impressive, in the sense of "these people grok their hardware" (see also, the wavefront tuning & message dispatch topology), but we engineers get to that point iteratively, working backwards from fix-points inworkloads and hardware. They went deep on an architecture they spent hundreds of millions on...so what's everyone else doing?
Thompson being shocked at this is like these mugs who think nobody can make a web page without React.
=> More informations about this toot | More toots from slightlyoff@toot.cafe
@slightlyoff I do not think of it as "insane", that's Stratchery editorializing. What I do think is that OpenAI doesn't have very good programmers. I can speculate on why their code might not be especially good, but it doesn't really matter. The evidence suggests that the work OpenAI has done to date falls in line with the typical mediocrity of the industry as a whole.
=> More informations about this toot | More toots from cratermoon@zirk.us This content has been proxied by September (3851b).Proxy Information
text/gemini