Ancestors

Toot

Written by sith@lemmy.zip on 2024-12-13 at 12:29

Questions about HW for local LLM.

https://lemmy.zip/post/27939366

=> More informations about this toot | More toots from sith@lemmy.zip

Descendants

Written by hendrik on 2024-12-13 at 12:54

I think this warrants an extra post. And the beginners thread is a year old and I guess not a lot of people watch comments there.

I use KoboldCpp and like to recommend that to people who are new to the hobby or don't own a proper gaming rig. It's relatively easy to install and you can try it now, without any GPU, and see if you like it. I'd say it's usable on CPU up to about 13B (with quantized models). Of course it'll be orders of magnitude slower than a GPU.

I'd say every bit of VRAM counts. So you might as well buy as much as you can afford. And you'll be able to run more intelligent models. Use one of the VRAM calculators to see what fits in 16GB or 24GB. And if you need it.

=> More informations about this toot | More toots from hendrik@palaver.p3x.de

Written by will_a113@lemmy.ml on 2024-12-13 at 13:23

24GB VRAM will easily let you run medium-sized models with good context length, and if you’re a gamer the XTX is a beast for raster performance and has good price/performance.

If you want to get serious about LLMs also keep in mind that most models and tools scale well across multiple GPUs, so you might buy one today (even a lesser one with “only” 16 or 12GB) and add another later. Just make sure your motherboard can fit 2, and you have a CPU, RAM and power supply that can handle it.

Here’s a good example from a guy who glued two much more modest cards together with decent results: adamniederer.com/blog/rocm-cross-arch.html

=> More informations about this toot | More toots from will_a113@lemmy.ml

Written by lime! on 2024-12-13 at 13:37

i have an XTX. it has a TDP of 400 watts. if you install two of them you’ve basically built a medium-effect space heater. you’ll need shitloads of cooling and a pretty beefy power supply.

performance-wise it’s pretty good. over 100 tokens a second with llama3 and it runs SDXL-Turbo about as fast as i can type.

word of warning, if you run Linux you need to manually set the fan curves. i had to RMA my first XTX because it didn’t spin the fans up and cooked itself. the VRAM reached 115°C and started failing.

=> More informations about this toot | More toots from lime@feddit.nu

Written by atzanteol@sh.itjust.works on 2024-12-13 at 13:55

That’s crazy - does it not have any thermal protection? I’ve had CPUs overheat and they tend to throttle/shutdown before I’ve had anything damaged.

=> More informations about this toot | More toots from atzanteol@sh.itjust.works

Written by lime! on 2024-12-13 at 14:01

no, it goes full speed until it dies if the fans don’t work. it is limited to 300W from factory but that’s about it. it pulls insane amounts of power.

from what I understand the 4090 is worse but that’s also much larger so it probably handles the heat better.

=> More informations about this toot | More toots from lime@feddit.nu

Written by atzanteol@sh.itjust.works on 2024-12-13 at 14:05

I’m also curious experimenting with a no-GPU setup.

In my limited experimentation on a 12th Gen Intel® Core™ i7-12700H with 64GB of RAM this is probably not worth it for anything beyond simple usage. You can do it but I get something like 4-5 toks/sec with LLama 3B Instruct and that’s one of the faster models.

=> More informations about this toot | More toots from atzanteol@sh.itjust.works

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113645517009640087
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 340.878599 milliseconds
Gemini-to-HTML Time: 1.344251 milliseconds

This content has been proxied by September (ba2dc).