Toot

Written by BB84@mander.xyz on 2025-01-06 at 20:16

LLM inference can be batched, reducing the cost per request. If you have too few customers, you can’t fill the optimal batch size.

That said, the optimal batch size on today’s hardware is not big (<20). I would be very very surprised if they couldn’t fill it.

=> More informations about this toot | View the thread | More toots from BB84@mander.xyz

Mentions

=> View sc_griffith@awful.systems profile

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113783248627903761
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
264.719375 milliseconds
Gemini-to-HTML Time
0.949021 milliseconds

This content has been proxied by September (3851b).