LLM inference can be batched, reducing the cost per request. If you have too few customers, you can’t fill the optimal batch size.
That said, the optimal batch size on today’s hardware is not big (<20). I would be very very surprised if they couldn’t fill it.
=> More informations about this toot | View the thread | More toots from BB84@mander.xyz
=> View sc_griffith@awful.systems profile
text/gemini
This content has been proxied by September (3851b).