Ancestors

Written by AndresFreundTec on 2025-01-12 at 17:51

Sometimes I hate performance stuff.

How to quickly zero large amounts of memory differs rather vastly between cpu generations.

On both systems core and memory are bound to node 0 and the same mmap flags are used (MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE)

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2025-01-12 at 17:53

2x Xeon Gold 5215:

..

memzero_rep_movsb():  6.171 GB/s

1x_mm256_store_si256(): 7.100 GB/s

2x_mm256_store_si256():  10.685 GB/s

1x_mm256_stream_si256(): 6.988 GB/s

2x_mm256_stream_si256():  6.994 GB/s

...

2x Xeon Gold 6442Y:

memzero_rep_movsb():  11.155 GB/s

1x_mm256_store_si256():  28.981 GB/s

2x_mm256_store_si256():  10.034 GB/s

1x_mm256_stream_si256(): 29.154 GB/s

2x_mm256_stream_si256(): 29.155 GB/s

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2025-01-12 at 17:53

I.e on Cascade Lake non-temporal stores suck and it's important to use more than one store per-loop. On Sapphire Rapids it's the other way round.

On both ERMSB is considerably worse than the alternatives.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by Paul Khuong on 2025-01-12 at 19:12

@AndresFreundTec did you mean rep movsb or rep stosb?

=> More informations about this toot | More toots from pkhuong@discuss.systems

Toot

Written by Paul Khuong on 2025-01-12 at 19:14

@AndresFreundTec also, you might see a small speed up with AVX-512 or even MOVDIR, just to avoid partial line stores.

=> More informations about this toot | More toots from pkhuong@discuss.systems

Descendants

Written by AndresFreundTec on 2025-01-12 at 20:14

@pkhuong I did test AVX512 too, but it wasn't different in an interesting way. No perf difference on cascade lake. On SR, there's no difference with 1x, a bit faster with 2x/4x (but 2x/4x is much slower than 1x, to a degree that doesn't make any sense to me).

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113816978181538554
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
356.419069 milliseconds
Gemini-to-HTML Time
1.090737 milliseconds

This content has been proxied by September (3851b).