Ancestors

Written by AndresFreundTec on 2025-01-12 at 17:51

Sometimes I hate performance stuff.

How to quickly zero large amounts of memory differs rather vastly between cpu generations.

On both systems core and memory are bound to node 0 and the same mmap flags are used (MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE)

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2025-01-12 at 17:53

2x Xeon Gold 5215:

..

memzero_rep_movsb():  6.171 GB/s

1x_mm256_store_si256(): 7.100 GB/s

2x_mm256_store_si256():  10.685 GB/s

1x_mm256_stream_si256(): 6.988 GB/s

2x_mm256_stream_si256():  6.994 GB/s

...

2x Xeon Gold 6442Y:

memzero_rep_movsb():  11.155 GB/s

1x_mm256_store_si256():  28.981 GB/s

2x_mm256_store_si256():  10.034 GB/s

1x_mm256_stream_si256(): 29.154 GB/s

2x_mm256_stream_si256(): 29.155 GB/s

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2025-01-12 at 17:53

I.e on Cascade Lake non-temporal stores suck and it's important to use more than one store per-loop. On Sapphire Rapids it's the other way round.

On both ERMSB is considerably worse than the alternatives.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by Paul Khuong on 2025-01-12 at 19:12

@AndresFreundTec did you mean rep movsb or rep stosb?

=> More informations about this toot | More toots from pkhuong@discuss.systems

Written by AndresFreundTec on 2025-01-12 at 20:10

@pkhuong I tried both, same performance. Probably should have include stosb, not movsb in the post.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by Paul Khuong on 2025-01-12 at 20:15

@AndresFreundTec they do very different things

=> More informations about this toot | More toots from pkhuong@discuss.systems

Toot

Written by AndresFreundTec on 2025-01-13 at 15:16

@pkhuong But they can do very similar things, rep mov* copying from a zero buffer, rep stos* setting the memory to zero directly.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Descendants

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113821707852225453
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
326.48547 milliseconds
Gemini-to-HTML Time
1.337299 milliseconds

This content has been proxied by September (3851b).