@nicd There is no direct answer to this. If the model is a GPT-3 style model, then the relationship between number of paramters and energy consumption is close to square root; but for GPT-4, the architecture is different, it is a "mixture of experts" model, effectively a number of GPT-3 style models combined, but not all parameters are contributing (sparse model). But such a model has a higher overhead, so for the same number of effective parameters, energy consumption is higher.
=> More informations about this toot | View the thread | More toots from wim_v12e@scholar.social
=> View nicd@masto.ahlcode.fi profile
text/gemini
This content has been proxied by September (3851b).