Yesterday I finally solved the third hardest discrete bug I’ve worked on in my career. Took about 3.5 weeks of work. At the end, I spent 10 consecutive days doing nothing but investigating this bug, all day, every day. I then took a day off to pick up my folks in Pennsylvania because they wrecked their car (everyone is fine, just car damage). Came back, and yesterday I finally found it.
1/
=> More informations about this toot | More toots from cocoaphony@mastodon.social
The bug was in Android code that worked fine on arm64 and x86, but crashed on x86_64. Heisenbug: adding logs would change its behavior. All the relevant bits were deep in C++ code, 3 layers of modules from the Java, so a debugger is basically impossible. Just logs.
Different modules have to be built on different machines with different build systems and assembled by hacking .so files into the APK by hand. And the results had to be tested on a Windows box for x86_64.
2/
=> More informations about this toot | More toots from cocoaphony@mastodon.social
Logs indicated that on x86_64, even incredibly simple function calls would corrupt their parameters. Local, constant strings would log as garbage, and sometimes just logging them would crash the app.
I figured it was a compiler setting mismatch. Maybe something like Microsoft parameter passing conventions, though clearly it wasn’t that. Checked for weird 64/32-bit mismatches. Maybe an NDK mismatch. Found a mismatch on the version of Android targeted, but that wasn’t it.
3/
=> More informations about this toot | More toots from cocoaphony@mastodon.social
Finally realized the Heisenbug nature. Code that worked stopped working when I added logs.
Tried calling __android_log_print directly rather than using the LOGE macro and the corruption went away.
It was the logger the entire time.
The default Log() function that came with the 3rdparty code passed a va_list to printf(). It has been modified to also pass the va_list to android_log_print. Reusing a va_list is not legal. On x64 it led to crashes.
Quick fix.
/fin
=> More informations about this toot | More toots from cocoaphony@mastodon.social
@cocoaphony Jesus fucking Christ.
=> More informations about this toot | More toots from Migueldeicaza@mastodon.social
@cocoaphony good catch!
=> More informations about this toot | More toots from madewulf@mastodon.social
@cocoaphony Great description! What did it tell you about the testing of that 3rd party code?
=> More informations about this toot | More toots from jgordon@appdot.net
@jgordon well, the 3rdparty code itself was fine. The bug was in a patch we wrote to make it log on Android. And it does work, but it’s undefined behavior. And that undefined behavior doesn’t work on one processor that’s very rarely used in phones. So it originally was reported as an Android Auto bug. Took a long time to realize it was the architecture not the OS.
=> More informations about this toot | More toots from cocoaphony@mastodon.social
@cocoaphony woooow!
=> More informations about this toot | More toots from bigzaphod@mastodon.social
@cocoaphony DID YOU TRY UNPLUGGING IT AND PLUGGING IT AGAIN CONGRATS
=> More informations about this toot | More toots from chockenberry@mastodon.social This content has been proxied by September (ba2dc).Proxy Information
text/gemini