Toot

Written by Wim🧮 on 2025-01-07 at 14:41

@nicd There is no direct answer to this. If the model is a GPT-3 style model, then the relationship between number of paramters and energy consumption is close to square root; but for GPT-4, the architecture is different, it is a "mixture of experts" model, effectively a number of GPT-3 style models combined, but not all parameters are contributing (sparse model). But such a model has a higher overhead, so for the same number of effective parameters, energy consumption is higher.

=> More informations about this toot | View the thread | More toots from wim_v12e@scholar.social

Mentions

=> View nicd@masto.ahlcode.fi profile

Toot

Written by Wim🧮 on 2025-01-07 at 14:41

Mentions

Tags