Toot

Written by BB84@mander.xyz on 2025-01-07 at 06:51

Can someone explain why I am being downvoted and attacked in this thread? I swear I am not sealioning. Genuinely confused.

@sc_griffith@awful.systems asked how request frequency might impact cost per request. Batch inference is a reason (ask anyone in the self-hosted LLM community). I noted that this reason only applies at very small scale, probably much smaller than what OpenAI is operating at.

@dgerard@awful.systems why did you say I am demanding someone disprove the assertion? Are you misunderstanding “I would be very very surprised if they couldn’t fill [the optimal batch size] for any few-seconds window” to mean “I would be very very surprised if they are not profitable”?

The tweet I linked shows that LLM inference can be done much more xheaply and efficiently. I am saying that OpenAI is very inefficient and thus economically “cooked”, as the post title will have it. How does this make me FYGM? @froztbyte@awful.systems

=> More informations about this toot | View the thread | More toots from BB84@mander.xyz

Mentions

=> View froztbyte@awful.systems profile | View sc_griffith@awful.systems profile | View dgerard@awful.systems profile

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113785743822289106
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
221.451481 milliseconds
Gemini-to-HTML Time
0.659931 milliseconds

This content has been proxied by September (3851b).