Cascade lake has the added fun that using non-temporal stores on memory on another node is actually a lot faster than what's achievable on the local node.
nodebind: 0, membind: 0
1x_mm_stream_si128(): 6.926 GB/s
nodebind: 0, membind: 1
1x_mm_stream_si128(): 20.693 GB/s
=> More informations about this toot | View the thread
I.e on Cascade Lake non-temporal stores suck and it's important to use more than one store per-loop. On Sapphire Rapids it's the other way round.
On both ERMSB is considerably worse than the alternatives.
=> More informations about this toot | View the thread
2x Xeon Gold 5215:
..
memzero_rep_movsb(): 6.171 GB/s
1x_mm256_store_si256(): 7.100 GB/s
2x_mm256_store_si256(): 10.685 GB/s
1x_mm256_stream_si256(): 6.988 GB/s
2x_mm256_stream_si256(): 6.994 GB/s
...
2x Xeon Gold 6442Y:
memzero_rep_movsb(): 11.155 GB/s
1x_mm256_store_si256(): 28.981 GB/s
2x_mm256_store_si256(): 10.034 GB/s
1x_mm256_stream_si256(): 29.154 GB/s
2x_mm256_stream_si256(): 29.155 GB/s
=> More informations about this toot | View the thread
Sometimes I hate performance stuff.
How to quickly zero large amounts of memory differs rather vastly between cpu generations.
On both systems core and memory are bound to node 0 and the same mmap flags are used (MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE)
=> More informations about this toot | View the thread
@axboe Doing concurrent direct IO into 1GB anonymous huge pages hits severe contention within bio_set_pages_dirty().
In my case this makes IO into 1GB pages max out at ~18GB/s, whereas 2MB huge pages reaches ~34GB/s (the hardware limit).
Is this even needed when anon huge pages are targeted?
https://gist.github.com/anarazel/304aa6b81d05feb3f4990b467d02dabc
=> More informations about this toot | View the thread
Turns out
tc qdisc add dev lo root netem delay 70ms 10ms
makes things slower. Surprise.
=> More informations about this toot | View the thread
When you wonder why regression tests are suddenly slower and then, after a decent bit of investigating, remember that you recently added a 70ms delay to localhost connections while debugging something else...
=> More informations about this toot | View the thread
Of course that's just when one there's a new HW generation announced but not yet shipping.
=> More informations about this toot | View the thread
The mac mini I was using for postgres CI died. I suspect the PSU. Too old for warranty. And I don't (yet) have the right screwdrivers to open it open. I guess I could put it under a drillpress... :/
=> More informations about this toot | View the thread
(the context is Postgres CI)
=> More informations about this toot | View the thread
Is there a way to get a license to run more than two macOS VMs on one host?
=> More informations about this toot | View the thread
@twoscomplement IIRC Sparc was in practice also TSO.
=> More informations about this toot | View the thread
Just gave a talk about improving postgres on NUMA systems:
https://anarazel.de/talks/2024-10-23-pgconf-eu-numa-vs-postgresql/numa-vs-postgresql.pdf
=> More informations about this toot | View the thread
Does anybody have a recommendation for a decent read-heavy/only "OLTP-ish" database benchmark where the queries aren't just single-table point or range selects?
Would like to showcase some scalability problems + solution that I see with such queries, but right now it's mostly visible with queries I made up on my own. It's easy enough to make up workloads that show almost anything, so I'd rather use something designed by someone else.
=> More informations about this toot | View the thread
Uh. Huh. Is it expected that SMAP can lead to a noticeable reduction in speed of copying from kernel to userspace?
It seems to be tied to CPU caches to some degree.
Reading a large, cached, file from the kernel I get substantially higher throughput when aggressively reusing the "target" buffers. But there's very little difference if I boot with clearcpuid=smap.
=> More informations about this toot | View the thread
Hrmpf. Getting correctable AER errors on my new workstation when utilizing pcie 5.0 nvme storage.
=> More informations about this toot | View the thread
=> This profile with reblog | Go to AndresFreundTec@mastodon.social account This content has been proxied by September (ba2dc).Proxy Information
text/gemini