Can someone explain why I am being downvoted and attacked in this thread? I swear I am not sealioning. Genuinely confused.
@sc_griffith@awful.systems asked how request frequency might impact cost per request. Batch inference is a reason (ask anyone in the self-hosted LLM community). I noted that this reason only applies at very small scale, probably much smaller than what OpenAI is operating at.
@dgerard@awful.systems why did you say I am demanding someone disprove the assertion? Are you misunderstanding “I would be very very surprised if they couldn’t fill [the optimal batch size] for any few-seconds window” to mean “I would be very very surprised if they are not profitable”?
The tweet I linked shows that LLM inference can be done much more xheaply and efficiently. I am saying that OpenAI is very inefficient and thus economically “cooked”, as the post title will have it. How does this make me FYGM? @froztbyte@awful.systems
=> More informations about this toot | View the thread | More toots from BB84@mander.xyz
=> View froztbyte@awful.systems profile | View sc_griffith@awful.systems profile | View dgerard@awful.systems profile
text/gemini
This content has been proxied by September (3851b).