Ancestors

Toot

Written by Manawyrm | Sarah on 2024-09-01 at 21:22

New blog post:

Executing Linux applications on a Raspberry Pi in less than 3.5s from power-up! 🚀🏎️

(and other power saving tricks)

https://kittenlabs.de/blog/2024/09/01/extreme-pi-boot-optimization/

=> More informations about this toot | More toots from manawyrm@chaos.social

Descendants

Written by Marcus Müller on 2024-09-01 at 21:41

@manawyrm cool stuff! Have you tried with kernel compression through lz4, I.e., CONFIG_KERNEL_LZ4 ? Iirc, that beat the default gzip decompression very solidly in speed. You might also want to try _ZSTD, as I suspect the lower decompression speed of that might balance with the potentially higher compression ratio.

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Manawyrm | Sarah on 2024-09-01 at 21:45

@funkylab I‘ve played around with kernel compression, but the extra energy required to decompress the kernel is harmful in my application.

In a less power constrained application, there might be some benefit, yeah!

A hardcore solution would be to write a custom minimal bootloader to move the kernel load away from the GPU and onto the CPU.

I‘m not that desperate yet, but it might help :)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Marcus Müller on 2024-09-01 at 21:56

@manawyrm that's why I ask: lz4 decompression is (at least on x86) so much faster that it's hard for me to imagine it using the same energy as gzip decompression! But if your tests show that's not the case, then my takeaway is that more hardware-compatible unpackers really are faster by giving the CPU fewer opportunities to do nothing and hence run with less power. Nice lesson!

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Manawyrm | Sarah on 2024-09-01 at 21:59

@funkylab At that point, the other CPU cores also aren‘t up yet. That limits the performance somewhat.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Marcus Müller on 2024-09-01 at 22:00

@manawyrm I don't think more cores would even help with lz4

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Erin 💽✨ on 2024-09-01 at 22:53

@manawyrm @funkylab I'm guessing you tried U-Boot and it didn't help?

=> More informations about this toot | More toots from erincandescent@erincandescent.net

Written by Manawyrm | Sarah on 2024-09-01 at 22:54

@erincandescent @funkylab U-Boot takes soo long to initialize and run that it just invalidates all benefit. I either need a brutally stripped down version or something custom.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Manawyrm | Sarah on 2024-09-01 at 22:55

@erincandescent @funkylab U-Boot does things „the right way“, i kinda want something hacky. Let‘s assume all the peripherals are already up, don‘t validate anything, make some wild assumptions and just brutally load stuff from hardcoded addresses into memory.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 23:02

@manawyrm @erincandescent @funkylab this looks vaguely promising if not a bit of effort to get working

https://github.com/DOGSHITD/Simple-UEFI-Bootloader-ARM64

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Marcus Müller on 2024-09-01 at 23:12

@gsuberland @manawyrm @erincandescent wait, there's no UEFI on the RPi, unless you teach it (e.g. by running U-boot)

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Manawyrm | Sarah on 2024-09-01 at 23:15

@funkylab @gsuberland @erincandescent yeah, extra layers of glue code aren‘t really what I need.

There are several bare-metal projects like Pi1541 where I might be able to steal the toolchain + boilerplate from.

I‘m just not sure if it‘s worth the (engineering) time.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Purple :verified: on 2024-09-01 at 21:43

@manawyrm Great write-up!

=> More informations about this toot | More toots from Purple@woof.tech

Written by Flauschdompteur on 2024-09-01 at 21:47

@manawyrm That's seriously fast tips hat

One question though - why go the buildroot way instead of "just" building a custom kernel image?

=> More informations about this toot | More toots from diebarschlampe@mas.to

Written by Manawyrm | Sarah on 2024-09-01 at 21:53

@diebarschlampe I wanted to have buildroot anyway for the main project, it helps a lot with CI/reproducible builds.

Buildroot also takes away a lot of the pain of creating disk images, getting a compatible aarch64 toolchain, etc.

While the initial setup is more complex, just being able to call „make“ without worrying about local toolchain options (like CROSS_COMPILE and ARCH) is nice.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by IronCladLou on 2024-09-01 at 22:13

@manawyrm this is an excellent study into power tuning the pi. I am going to use some of these settings for a pi project I’ve been working on for a while. Boot up time always disappoints me (I’m using bookworm 12 on an original pi zero w)

Great work! Thank you!

=> More informations about this toot | More toots from ironcladlou@hachyderm.io

Written by William D. Jones on 2024-09-01 at 22:40

@manawyrm I definitely wish I knew about USB-SD-Mux a year ago... but a little pricey for me right now.

Also, in the last graph, are pin 0 and pin 1 both part of the first userspace app to run, or is pin 0 "boot is done, from kernelspace" and pin 1 is "toggle a pin from userspace"?

=> More informations about this toot | More toots from cr1901@mastodon.social

Written by Manawyrm | Sarah on 2024-09-01 at 22:45

@cr1901 the latter.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by William D. Jones on 2024-09-01 at 22:46

@manawyrm How do you configure that, out of curiosity? I know you can configure LEDs to do specific things (e.g. on my TinkerBoard, the yellow LED is a "heartbeat"), but I don't know the specifics...

=> More informations about this toot | More toots from cr1901@mastodon.social

Written by Manawyrm | Sarah on 2024-09-01 at 22:48

@cr1901 On the Pi, using the dtoverlay mechanism you can configure the pins however you want. In my case I‘m using the gpio-shutdown overlay, which turns a GPIO on as long as the kernel is running and then turns it off after shutdown/halt (so you can cut power)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Andy on 2024-09-01 at 22:47

@manawyrm I need one of these SD-Card muxes! 😍

=> More informations about this toot | More toots from G33KatWork@infosec.exchange

Written by Manawyrm | Sarah on 2024-09-01 at 22:49

@G33KatWork Right?! Best tool for embedded hackers.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 22:49

@manawyrm something worth trying is dropping the 5V supply voltage for the Pi down to 4V. the limited schematics available show the board using PAM2306 and RT8088 buck regulators to derive the 3.3V and VDD_CORE voltage rails. the PAM2306 goes from 60% efficient at 5V up to 85% efficient at 4V in, which is a major increase. the RT8088 gains a couple of percent efficiency by dropping to 4V too.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Manawyrm | Sarah on 2024-09-01 at 22:52

@gsuberland Uhhh! Interesting! That should be easy to test, thanks.

Not sure what the camera module thinks of this, but I‘ll give it a shot!

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 22:53

@manawyrm according to the schematic I'm looking at, the camera port runs 3.3V, so you'll be saving power there too.

https://datasheets.raspberrypi.com/rpizero2/raspberry-pi-zero-2-w-reduced-schematics.pdf

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Manawyrm | Sarah on 2024-09-01 at 22:53

@gsuberland Huh!! Thanks 😻

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 22:56

@manawyrm I don't know where else the 5V rails go on the board since I don't have the rest of the schematic, but based on how they've named the power nets it's a pretty reasonable guess that the only other usage of the 5V rail is for the USB ports, and since you're disabling those anyway you should be fine. the 5V_CORE is almost certainly a filtered power domain used only for the RT8088.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Timon 🛠 on 2024-09-01 at 23:01

Yea, like 99% certain 5V only powers the regulators and the USB ports.

=> More informations about this toot | More toots from timonsku@mastodon.social

Written by Timon 🛠 on 2024-09-01 at 23:04

Ah its a zero, yea then def. nothing critical. On Pi5 I'm not as certain with RP1.

HDMI also needs the 5V for the EDID but that will be fine at 4V too.

=> More informations about this toot | More toots from timonsku@mastodon.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 23:09

@timonsku EDID is also disabled here so it sounds like we're golden

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 23:08

@timonsku @manawyrm nice.

I think technically the efficiency peaks at about 3.6V but at that point you drop down into the lower range of the RT8088's current delivery capabilities, which might glitch the core voltage rail out during high load.

if it's not obviously unstable at 4V I'd maybe try 3.8V and stress test it. writing a script that goes from 0% load to 100% load and back repeatedly, alternating pure CPU and memory bound loads, is a great way to check for voltage regulation stability.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Manawyrm | Sarah on 2024-09-01 at 23:13

@gsuberland @timonsku with the final device being outdoors in anything between -20°C and +70°C weather I‘m a bit concerned about any sorts of overclocking or operating outside of regular operating parameters. Efficiency is nice, but compromising reliability for that isn‘t worth it (at least in this application)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-01 at 23:24

@manawyrm @timonsku away from my desktop at the moment but I can check the datasheets to see what the derating is on current delivery and efficiency. it should be fine at 4V though.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-02 at 00:40

@manawyrm @timonsku just checked. they all look pretty stable in that temperature range. you get some switching frequency drift but that shouldn't be an issue.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Graham Sutherland / Polynomial on 2024-09-02 at 00:48

@manawyrm @timonsku also since the efficiency is going up by a solid 15% you'll probably be seeing lower temps on the buck ICs anyway.

=> More informations about this toot | More toots from gsuberland@chaos.social

Written by Manawyrm | Sarah on 2024-09-02 at 07:32

@gsuberland Holy shit.

I just tried this and yes, you're totally right.

The switching regulator get's vastly more efficient at lower voltages.

I saved 20% total energy by reducing the input voltage down to 3.6V (updated blog post):

https://kittenlabs.de/blog/2024/09/01/extreme-pi-boot-optimization/#reducing-input-voltage

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by thomask77 on 2024-09-02 at 07:48

@manawyrm @gsuberland would a zstd compressed kernel save some time?

=> More informations about this toot | More toots from thomask77@mastodon.gamedev.place

Written by Manawyrm | Sarah on 2024-09-02 at 07:50

@thomask77 @gsuberland Yes, saves a bit of time, but the decompression uses lots of energy.

Not the right tradeoff for my application (where total energy is king, not time).

SD cards are fast -- best option would probably be to write a home-grown mini bootloader and read an uncompressed kernel at full 50+ MByte/s.

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Z̈oé ⛵ on 2024-09-02 at 08:05

@manawyrm @thomask77 @gsuberland I see you’ve benchmarked gzip‘d kernel and intitramfs but zstd is optimized for fast decompression, so it can be a lot faster, and I wonder if it still uses more energy (and which compression setting is optimal)

especially if the video core loads the kernel and that step is slow

=> More informations about this toot | More toots from uint8_t@chaos.social

Written by Paul on 2024-09-01 at 23:27

@manawyrm Thank you so much for writing and sharing that - absolutely fascinating.

I had already turned off the LED and carefully set the temperature the active cooling kicks in, but there are other tips there that I am keen to try.

Thank you again for your work, really appreciate it.

=> More informations about this toot | More toots from plwt@mstdn.social

Written by jonny (good kind) on 2024-09-01 at 23:31

@manawyrm

Thats sick as hell, thanks for sharing the process :)

=> More informations about this toot | More toots from jonny@neuromatch.social

Written by Tammi 🥴🐈‍⬛ on 2024-09-02 at 00:35

@manawyrm nice ^-^

=> More informations about this toot | More toots from tamtararam@chaos.social

Written by Tammi 🥴🐈‍⬛ on 2024-09-02 at 00:37

@manawyrm the hardware setup is especially novel to me. seems pretty cool that you can automate it this way ^-^

=> More informations about this toot | More toots from tamtararam@chaos.social

Written by Z̈oé ⛵ on 2024-09-02 at 07:57

@manawyrm elsewhere I don’t think I’ve seen much use of watt-seconds as unit (it’s J)

=> More informations about this toot | More toots from uint8_t@chaos.social

Written by Manawyrm | Sarah on 2024-09-02 at 07:59

@uint8_t I must admit, my brain can deal with Ws, Wh, mA, mAs, mAh much better than mC and J (even though those are technically more correct, I guess)

But that's just personal preference after all :)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Z̈oé ⛵ on 2024-09-02 at 08:00

@manawyrm watt is joules per second, so watt-second is joules per second-second

=> More informations about this toot | More toots from uint8_t@chaos.social

Written by Manawyrm | Sarah on 2024-09-02 at 08:00

@uint8_t be thankful I don't start using pirate-ninjas :P

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by sebastian on 2024-09-02 at 09:43

@manawyrm @uint8_t So 1J = 0,000239006 kcal. A 500ml bottle of clubmate has 20 kcal (according to $random foodwebsite). So 1J = 83679.908 club mate bottles energy equivalence units.

=> More informations about this toot | More toots from sebastian@schottkydio.de

Written by Manawyrm | Sarah on 2024-09-02 at 09:45

@sebastian @uint8_t that doesn't sound quite right.

500ml Club Mate has 585 kJ of energy.

For comparison, a single 18650 with 3000mAh has about 40 kJ.

Are you volunteering to pedal a bike (while fed only Club Mate) on that mountain in winter days to recharge my battery? 😹

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by sebastian on 2024-09-02 at 10:06

@manawyrm @uint8_t

Yeah sorry I'm stupid (a bottle of mate might have prevented that one...)

https://www.foodrepo.org/en/products/1741

The photo says: 20 kcal or 84kJ.

I got my division wrong.

1 club mate bottles energy equivalence units = 83679.908J ~ 84kJ

=> More informations about this toot | More toots from sebastian@schottkydio.de

Written by Z̈oé ⛵ on 2024-09-02 at 13:03

@sebastian @manawyrm It’s 84kJ / 100ml so 420 kJ total

(you should really get a mate)

=> More informations about this toot | More toots from uint8_t@chaos.social

Written by Manawyrm | Sarah on 2024-09-02 at 13:10

@uint8_t @sebastian haha, this means Club Mate and modern 18650 have the same energy density by weight :)

500ml Mate -> 420kJ

10x 18650 (at 48g each) -> 416kJ

Fun :)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Hugo 雨果 on 2024-09-02 at 08:15

@manawyrm Really interesting read.

=> More informations about this toot | More toots from whynothugo@fosstodon.org

Written by Daniel Bohrer on 2024-09-02 at 12:24

@manawyrm oh nice, that was exactly the setup and the aim of my master's thesis (8 years ago, with a sloooow RPi 1, and I didn't finish it, but I got my current job by sending patches related to it…)

=> More informations about this toot | More toots from daniel_bohrer@chaos.social

Written by Manawyrm | Sarah on 2024-09-02 at 12:31

@daniel_bohrer We would've almost been colleagues at your current job ;)

=> More informations about this toot | More toots from manawyrm@chaos.social

Written by Daniel Bohrer on 2024-09-02 at 12:33

@manawyrm ah yes, I remember now :D

=> More informations about this toot | More toots from daniel_bohrer@chaos.social

Written by Marcus Müller on 2024-09-02 at 15:57

@manawyrm having slept over this:

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Marcus Müller on 2024-09-02 at 16:02

@manawyrm … see LZ4 really being a significant sink of CPU cycles

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Marcus Müller on 2024-09-02 at 16:12

@manawyrm

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Marcus Müller on 2024-09-02 at 16:18

@manawyrm

=> More informations about this toot | More toots from funkylab@mastodon.social

Written by Marcus Müller on 2024-09-02 at 16:20

@manawyrm and then the much faster (main) CPU takes over, executes the small uboot SPL, which then initializes just enough things to be able to jump to linux.

=> More informations about this toot | More toots from funkylab@mastodon.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113064393666188861
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
637.283608 milliseconds
Gemini-to-HTML Time
31.024519 milliseconds

This content has been proxied by September (ba2dc).