Toot

Written by BB84@mander.xyz on 2025-01-06 at 20:16

LLM inference can be batched, reducing the cost per request. If you have too few customers, you can’t fill the optimal batch size.

That said, the optimal batch size on today’s hardware is not big (<20). I would be very very surprised if they couldn’t fill it.

=> More informations about this toot | View the thread | More toots from BB84@mander.xyz

Mentions

=> View sc_griffith@awful.systems profile

Tags

Proxy Information

Original URL: gemini://mastogem.picasoft.net/toot/113783248627903761
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 264.719375 milliseconds
Gemini-to-HTML Time: 0.949021 milliseconds

This content has been proxied by September (3851b).