New blog post:
Executing Linux applications on a Raspberry Pi in less than 3.5s from power-up! 🚀🏎️
(and other power saving tricks)
https://kittenlabs.de/blog/2024/09/01/extreme-pi-boot-optimization/
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm cool stuff! Have you tried with kernel compression through lz4, I.e., CONFIG_KERNEL_LZ4 ? Iirc, that beat the default gzip decompression very solidly in speed. You might also want to try _ZSTD, as I suspect the lower decompression speed of that might balance with the potentially higher compression ratio.
=> More informations about this toot | More toots from funkylab@mastodon.social
@funkylab I‘ve played around with kernel compression, but the extra energy required to decompress the kernel is harmful in my application.
In a less power constrained application, there might be some benefit, yeah!
A hardcore solution would be to write a custom minimal bootloader to move the kernel load away from the GPU and onto the CPU.
I‘m not that desperate yet, but it might help :)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm that's why I ask: lz4 decompression is (at least on x86) so much faster that it's hard for me to imagine it using the same energy as gzip decompression! But if your tests show that's not the case, then my takeaway is that more hardware-compatible unpackers really are faster by giving the CPU fewer opportunities to do nothing and hence run with less power. Nice lesson!
=> More informations about this toot | More toots from funkylab@mastodon.social
@funkylab At that point, the other CPU cores also aren‘t up yet. That limits the performance somewhat.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm I don't think more cores would even help with lz4
=> More informations about this toot | More toots from funkylab@mastodon.social
@manawyrm @funkylab I'm guessing you tried U-Boot and it didn't help?
=> More informations about this toot | More toots from erincandescent@erincandescent.net
@erincandescent @funkylab U-Boot takes soo long to initialize and run that it just invalidates all benefit. I either need a brutally stripped down version or something custom.
=> More informations about this toot | More toots from manawyrm@chaos.social
@erincandescent @funkylab U-Boot does things „the right way“, i kinda want something hacky. Let‘s assume all the peripherals are already up, don‘t validate anything, make some wild assumptions and just brutally load stuff from hardcoded addresses into memory.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @erincandescent @funkylab this looks vaguely promising if not a bit of effort to get working
https://github.com/DOGSHITD/Simple-UEFI-Bootloader-ARM64
=> More informations about this toot | More toots from gsuberland@chaos.social
@gsuberland @manawyrm @erincandescent wait, there's no UEFI on the RPi, unless you teach it (e.g. by running U-boot)
=> More informations about this toot | More toots from funkylab@mastodon.social
@funkylab @gsuberland @erincandescent yeah, extra layers of glue code aren‘t really what I need.
There are several bare-metal projects like Pi1541 where I might be able to steal the toolchain + boilerplate from.
I‘m just not sure if it‘s worth the (engineering) time.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm Great write-up!
=> More informations about this toot | More toots from Purple@woof.tech
@manawyrm That's seriously fast tips hat
One question though - why go the buildroot way instead of "just" building a custom kernel image?
=> More informations about this toot | More toots from diebarschlampe@mas.to
@diebarschlampe I wanted to have buildroot anyway for the main project, it helps a lot with CI/reproducible builds.
Buildroot also takes away a lot of the pain of creating disk images, getting a compatible aarch64 toolchain, etc.
While the initial setup is more complex, just being able to call „make“ without worrying about local toolchain options (like CROSS_COMPILE and ARCH) is nice.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm this is an excellent study into power tuning the pi. I am going to use some of these settings for a pi project I’ve been working on for a while. Boot up time always disappoints me (I’m using bookworm 12 on an original pi zero w)
Great work! Thank you!
=> More informations about this toot | More toots from ironcladlou@hachyderm.io
@manawyrm I definitely wish I knew about USB-SD-Mux a year ago... but a little pricey for me right now.
Also, in the last graph, are pin 0 and pin 1 both part of the first userspace app to run, or is pin 0 "boot is done, from kernelspace" and pin 1 is "toggle a pin from userspace"?
=> More informations about this toot | More toots from cr1901@mastodon.social
@cr1901 the latter.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm How do you configure that, out of curiosity? I know you can configure LEDs to do specific things (e.g. on my TinkerBoard, the yellow LED is a "heartbeat"), but I don't know the specifics...
=> More informations about this toot | More toots from cr1901@mastodon.social
@cr1901 On the Pi, using the dtoverlay mechanism you can configure the pins however you want. In my case I‘m using the gpio-shutdown overlay, which turns a GPIO on as long as the kernel is running and then turns it off after shutdown/halt (so you can cut power)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm I need one of these SD-Card muxes! 😍
=> More informations about this toot | More toots from G33KatWork@infosec.exchange
@G33KatWork Right?! Best tool for embedded hackers.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm something worth trying is dropping the 5V supply voltage for the Pi down to 4V. the limited schematics available show the board using PAM2306 and RT8088 buck regulators to derive the 3.3V and VDD_CORE voltage rails. the PAM2306 goes from 60% efficient at 5V up to 85% efficient at 4V in, which is a major increase. the RT8088 gains a couple of percent efficiency by dropping to 4V too.
=> More informations about this toot | More toots from gsuberland@chaos.social
@gsuberland Uhhh! Interesting! That should be easy to test, thanks.
Not sure what the camera module thinks of this, but I‘ll give it a shot!
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm according to the schematic I'm looking at, the camera port runs 3.3V, so you'll be saving power there too.
https://datasheets.raspberrypi.com/rpizero2/raspberry-pi-zero-2-w-reduced-schematics.pdf
=> More informations about this toot | More toots from gsuberland@chaos.social
@gsuberland Huh!! Thanks 😻
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm I don't know where else the 5V rails go on the board since I don't have the rest of the schematic, but based on how they've named the power nets it's a pretty reasonable guess that the only other usage of the 5V rail is for the USB ports, and since you're disabling those anyway you should be fine. the 5V_CORE is almost certainly a filtered power domain used only for the RT8088.
=> More informations about this toot | More toots from gsuberland@chaos.social
Yea, like 99% certain 5V only powers the regulators and the USB ports.
=> More informations about this toot | More toots from timonsku@mastodon.social
Ah its a zero, yea then def. nothing critical. On Pi5 I'm not as certain with RP1.
HDMI also needs the 5V for the EDID but that will be fine at 4V too.
=> More informations about this toot | More toots from timonsku@mastodon.social
@timonsku EDID is also disabled here so it sounds like we're golden
=> More informations about this toot | More toots from gsuberland@chaos.social
@timonsku @manawyrm nice.
I think technically the efficiency peaks at about 3.6V but at that point you drop down into the lower range of the RT8088's current delivery capabilities, which might glitch the core voltage rail out during high load.
if it's not obviously unstable at 4V I'd maybe try 3.8V and stress test it. writing a script that goes from 0% load to 100% load and back repeatedly, alternating pure CPU and memory bound loads, is a great way to check for voltage regulation stability.
=> More informations about this toot | More toots from gsuberland@chaos.social
@gsuberland @timonsku with the final device being outdoors in anything between -20°C and +70°C weather I‘m a bit concerned about any sorts of overclocking or operating outside of regular operating parameters. Efficiency is nice, but compromising reliability for that isn‘t worth it (at least in this application)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @timonsku away from my desktop at the moment but I can check the datasheets to see what the derating is on current delivery and efficiency. it should be fine at 4V though.
=> More informations about this toot | More toots from gsuberland@chaos.social
@manawyrm @timonsku just checked. they all look pretty stable in that temperature range. you get some switching frequency drift but that shouldn't be an issue.
=> More informations about this toot | More toots from gsuberland@chaos.social
@manawyrm @timonsku also since the efficiency is going up by a solid 15% you'll probably be seeing lower temps on the buck ICs anyway.
=> More informations about this toot | More toots from gsuberland@chaos.social
@gsuberland Holy shit.
I just tried this and yes, you're totally right.
The switching regulator get's vastly more efficient at lower voltages.
I saved 20% total energy by reducing the input voltage down to 3.6V (updated blog post):
https://kittenlabs.de/blog/2024/09/01/extreme-pi-boot-optimization/#reducing-input-voltage
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @gsuberland would a zstd compressed kernel save some time?
=> More informations about this toot | More toots from thomask77@mastodon.gamedev.place
@thomask77 @gsuberland Yes, saves a bit of time, but the decompression uses lots of energy.
Not the right tradeoff for my application (where total energy is king, not time).
SD cards are fast -- best option would probably be to write a home-grown mini bootloader and read an uncompressed kernel at full 50+ MByte/s.
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @thomask77 @gsuberland I see you’ve benchmarked gzip‘d kernel and intitramfs but zstd is optimized for fast decompression, so it can be a lot faster, and I wonder if it still uses more energy (and which compression setting is optimal)
especially if the video core loads the kernel and that step is slow
=> More informations about this toot | More toots from uint8_t@chaos.social
@manawyrm Thank you so much for writing and sharing that - absolutely fascinating.
I had already turned off the LED and carefully set the temperature the active cooling kicks in, but there are other tips there that I am keen to try.
Thank you again for your work, really appreciate it.
=> More informations about this toot | More toots from plwt@mstdn.social
@manawyrm
Thats sick as hell, thanks for sharing the process :)
=> More informations about this toot | More toots from jonny@neuromatch.social
@manawyrm nice ^-^
=> More informations about this toot | More toots from tamtararam@chaos.social
@manawyrm the hardware setup is especially novel to me. seems pretty cool that you can automate it this way ^-^
=> More informations about this toot | More toots from tamtararam@chaos.social
@manawyrm elsewhere I don’t think I’ve seen much use of watt-seconds as unit (it’s J)
=> More informations about this toot | More toots from uint8_t@chaos.social
@uint8_t I must admit, my brain can deal with Ws, Wh, mA, mAs, mAh much better than mC and J (even though those are technically more correct, I guess)
But that's just personal preference after all :)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm watt is joules per second, so watt-second is joules per second-second
=> More informations about this toot | More toots from uint8_t@chaos.social
@uint8_t be thankful I don't start using pirate-ninjas :P
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @uint8_t So 1J = 0,000239006 kcal. A 500ml bottle of clubmate has 20 kcal (according to $random foodwebsite). So 1J = 83679.908 club mate bottles energy equivalence units.
=> More informations about this toot | More toots from sebastian@schottkydio.de
@sebastian @uint8_t that doesn't sound quite right.
500ml Club Mate has 585 kJ of energy.
For comparison, a single 18650 with 3000mAh has about 40 kJ.
Are you volunteering to pedal a bike (while fed only Club Mate) on that mountain in winter days to recharge my battery? 😹
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm @uint8_t
Yeah sorry I'm stupid (a bottle of mate might have prevented that one...)
https://www.foodrepo.org/en/products/1741
The photo says: 20 kcal or 84kJ.
I got my division wrong.
1 club mate bottles energy equivalence units = 83679.908J ~ 84kJ
=> More informations about this toot | More toots from sebastian@schottkydio.de
@sebastian @manawyrm It’s 84kJ / 100ml so 420 kJ total
(you should really get a mate)
=> More informations about this toot | More toots from uint8_t@chaos.social
@uint8_t @sebastian haha, this means Club Mate and modern 18650 have the same energy density by weight :)
500ml Mate -> 420kJ
10x 18650 (at 48g each) -> 416kJ
Fun :)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm Really interesting read.
=> More informations about this toot | More toots from whynothugo@fosstodon.org
@manawyrm oh nice, that was exactly the setup and the aim of my master's thesis (8 years ago, with a sloooow RPi 1, and I didn't finish it, but I got my current job by sending patches related to it…)
=> More informations about this toot | More toots from daniel_bohrer@chaos.social
@daniel_bohrer We would've almost been colleagues at your current job ;)
=> More informations about this toot | More toots from manawyrm@chaos.social
@manawyrm ah yes, I remember now :D
=> More informations about this toot | More toots from daniel_bohrer@chaos.social
@manawyrm having slept over this:
=> More informations about this toot | More toots from funkylab@mastodon.social
@manawyrm … see LZ4 really being a significant sink of CPU cycles
=> More informations about this toot | More toots from funkylab@mastodon.social
@manawyrm
=> More informations about this toot | More toots from funkylab@mastodon.social
@manawyrm
=> More informations about this toot | More toots from funkylab@mastodon.social
@manawyrm and then the much faster (main) CPU takes over, executes the small uboot SPL, which then initializes just enough things to be able to jump to linux.
=> More informations about this toot | More toots from funkylab@mastodon.social This content has been proxied by September (ba2dc).Proxy Information
text/gemini