Ancestors

Toot

Written by AndresFreundTec on 2024-08-02 at 20:23

Hrmpf. Getting correctable AER errors on my new workstation when utilizing pcie 5.0 nvme storage.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Descendants

Written by Edwin Török on 2024-08-02 at 20:59

@AndresFreundTec first time I got an M.2 NVMe device, I didn't plug it in well enough (didn't push it in completely as I was afraid to damage it). To my surprise it did negotiate a PCIe link successfully, but not the highest one it was supposed to be capable of (LnkCap2/LnkCtl2 discrepancy in lspci). Unplugging and plugging it back in solved it for me.

I wouldn't exclude some physical connectivity issue or interference as the reason for AER errors either.

Although there might be other reasons why it can't reach full speed ('dmesg' should tell you 'PCIe bandwidth, limited by')

=> More informations about this toot | More toots from edwintorok@discuss.systems

Written by AndresFreundTec on 2024-08-02 at 23:57

@edwintorok Turns out there are no AER errors if I 1) disable pcie link power management in the BIOS and 2) boot with pcie_aspm=off. Quite curious that the combination of both is necessary....

Even with the errors I reach full pcie 5 bandwidth. Which makes me suspect that the issue isn't a physical connectivity issue. I've tried to reseat and clean nonetheless. Could be interference, I guess.

I've ordered a different pcie 5 SSD and a different pcie slot -> m.2 adapter.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2024-08-04 at 17:34

@edwintorok They either train as pcie 4, don't work at all, or also have AER errors. So I suspect it's a mainboard / firmware issue :/

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by Edwin Török on 2024-08-04 at 17:47

@AndresFreundTec Try comparing lspci -vvv output when disabling ASPM on the kernel cmdline Vs when you also disable it in the BIOS: maybe the BIOS also alters some other settings too.

If you find out what it is then the motherboard/chipset might need an entry in quirks.c to disable ASPM if it doesn't work.

=> More informations about this toot | More toots from edwintorok@discuss.systems

Written by AndresFreundTec on 2024-08-06 at 16:50

@edwintorok I think disabling ASPM fixing the AER errors turned out to be a fluke - I couldn't reproduce it for quite a while. When I finally could, it turned out that that boot PCIe training only ended up with PCIe 4 - which also explains why there weren't any errors. Suspect that had happened before, although I am surprised I didn't notice the throughput difference.

I thought a couple other times I had "fixed" it, but every time it just turned out that I needed to wait longer.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2024-08-06 at 16:53

@edwintorok I've now tried 4 different PCIe 5.0 devices and all had issues - one only training to PCIe 4, another not training at all. I've removed every other nearby device and the issue remains.

So I'm concluding it's the mainboard that's the issue.

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Written by AndresFreundTec on 2024-08-06 at 16:58

@edwintorok (Or I guess firmware on the mainboard, who knows)

=> More informations about this toot | More toots from AndresFreundTec@mastodon.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/112894293567403804
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
283.317573 milliseconds
Gemini-to-HTML Time
1.830242 milliseconds

This content has been proxied by September (ba2dc).