Ancestors

Toot

Written by Christian Brauner 🦊🐺 on 2024-10-02 at 11:21

The thing I often need the most but don't have is a private test machine with as many cpus as possible so I can do meaningful performance testing. For example, right now I want to test some refcount improvements but I lack a machine with enough cpus to do that which is really annoying.

=> More informations about this toot | More toots from brauner@mastodon.social

Descendants

Written by Vegard Nossum on 2024-10-02 at 11:44

@brauner How many CPUs is that?

=> More informations about this toot | More toots from vegard@mastodon.social

Written by Christian Brauner 🦊🐺 on 2024-10-02 at 11:52

@vegard north of 64

=> More informations about this toot | More toots from brauner@mastodon.social

Written by Mathieu Desnoyers on 2024-10-02 at 12:31

@brauner it looks like we are both looking into scalability of reference counters in the Linux kernel. In my case it's in the scheduler+mm subsystems: https://lore.kernel.org/lkml/20241002010205.1341915-1-mathieu.desnoyers@efficios.com/

Are there specific reference counters which you suspect to be bottlenecks ?

=> More informations about this toot | More toots from DesnoyersMa@discuss.systems

Written by Christian Brauner 🦊🐺 on 2024-10-02 at 13:12

@DesnoyersMa I'm confused this is a separate hazard pointer implementation from Boqun?

=> More informations about this toot | More toots from brauner@mastodon.social

Written by Mathieu Desnoyers on 2024-10-02 at 13:14

@brauner Yes, this is a separate implementation. I've done a prototype implementation in userspace based on per-cpu HP slots, and then created a minimalistic port of that implementation to kernel-space.

=> More informations about this toot | More toots from DesnoyersMa@discuss.systems

Written by Jens Axboe on 2024-10-02 at 13:43

@DesnoyersMa I'll try that on the big box I have, curious! Not about the mm side specifically, just the hp case in general for other uses.

=> More informations about this toot | More toots from axboe@fosstodon.org

Written by Mathieu Desnoyers on 2024-10-02 at 13:50

@axboe Let me know how it goes. Note that if you run into limitations with my minimalistic implementation, there are various ways it can be improved to cover more use-cases (e.g. more hazard pointer slots per CPU, dynamically adjusting the per-CPU scan depth, scanning for HP ranges, ...). My approach is to enhance it only when use-cases require it.

=> More informations about this toot | More toots from DesnoyersMa@discuss.systems

Written by Jens Axboe on 2024-10-02 at 13:51

@DesnoyersMa Sure will do. It's 512 thread box, I'll run 24/48/96/192/256/512/1024 threads and dump the numbers here for -git and -git + patched.

=> More informations about this toot | More toots from axboe@fosstodon.org

Written by Jens Axboe on 2024-10-02 at 14:28

@DesnoyersMa Here's the quick run, 48..2048 threads. System is a 2x 9754. Not sure this is what you expected, but it's 100% reproducible. Ran the tests twice on both, separate boots, and it's consistent. Test is context_switch1_threads -t.

=> View attached media

=> More informations about this toot | More toots from axboe@fosstodon.org

Written by Mathieu Desnoyers on 2024-10-02 at 14:30

@axboe That's unexpected. I tested on a AMD EPYC 9654 96-Core Processor (2 sockets, 384 HW threads total) and got very different results. Perhaps we should share our kernel config by email.

=> More informations about this toot | More toots from DesnoyersMa@discuss.systems

Written by Mathieu Desnoyers on 2024-10-02 at 14:39

@axboe scratch my previous comment. That's a 4.9x speedup (490%) for 192 threads ??

=> More informations about this toot | More toots from DesnoyersMa@discuss.systems

Written by Jens Axboe on 2024-10-02 at 14:41

@DesnoyersMa Right, any of the Diff results are how much faster the patched kernel is compared to the stock one. So 192 threads, that's a +390% speedup, or 4.9x as fast.

=> More informations about this toot | More toots from axboe@fosstodon.org

Written by Jens Axboe on 2024-10-02 at 13:54

@brauner I've got a few bigger boxes that you're welcome to get an account on for testing. Maybe that'll help until you get one sourced?

=> More informations about this toot | More toots from axboe@fosstodon.org

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113237561981799019
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 549.549763 milliseconds
Gemini-to-HTML Time: 2.333141 milliseconds

This content has been proxied by September (ba2dc).