Breaking Ascon-128 with Differential Fault Analysis

Background

Last month, NIST announced[1] that Ascon will be the standard for lightweight crypto. Ascon is a tiny and fast family of algorithms which can perform authenticated encryption and hashing. Although I had been loosely following the lightweight crypto contest, I had not actually looked into the details of any of the entries. Although Ascon has a lot of impressive features, one of the first things that stood out to me as a hardware hacker was that AEAD encryption seems very susceptible to differential fault analysis.

Coincidentally, my ChipWhisperer Husky[2] arrived only a few days after I first looked into Ascon. The ChipWhisperer Husky is a brand new tool from NewAE for performing fault injection and side channel analysis attacks, and it comes with a lot of fancy features (most of which won't be covered in this post). Given that Ascon was fresh on my mind, I decided to try to break it with DFA. This post will be about that attack.

Fault Injection

Fault injection, or glitching, is a type of active attack on hardware devices where the attacker modifies the operating conditions of a system to cause it to misbehave. By very precisely controlling these changes, specific effects can be induced in the target. In this post, I am using voltage fault injection against code running on a microcontroller. There are many other forms of fault injection (electromagnetic, laser, thermal, clock), and these techniques can work on both hardware (e.g. crypto accelerators which are not implemented with code running on a cpu) and software targets. Using the ChipWhisperer Husky, I can short the victim processors power supply to ground through a MOSFET for a controlled amount of time. When this happens, the internals of the CPU will misbehave. Although it is impossible to directly measure exactly what happens internally, some examples of common effects are:

As a programmer, you can create some mitigations to help with some of these fault types. But as you can probably imagine, it is basically impossible to write a completely secure program in the presence of faults. High-security systems rely on a combination of hardware and software countermeasures to mitigate and detect fault attacks.

In order to perform fault injection attacks, we need:

With voltage fault injection on the Husky, we can vary two parameters:

If the glitch length is too long, we will always crash the target. If it's too short, we will never affect the target. If it's just right, we should see a mix of crashes, corrupted outputs, and (hopefully) successful glitches as we vary the glitch delay.

The glitch delay doesn't give us as much feedback for tuning it, but at a minimum we can usually observe data corruptions, try to guess what operation we affected, and tune based on that.

My approach in this case was to manually tune the glitch length down to a small range, set wide parameters for glitch delay, and brute force the search space. This isn't always practical if you have a large search space or a very slow operation, but it was good enough for this PoC.

Differential Fault Analysis

Differential fault analysis (DFA) is an active attack against implementations of cryptographic algorithms. The attack is performed by introducing faults during intermediate steps of the algorithm, and observing how the output changes. If an attacker can reliably corrupt the right data, it may be possible to calculate the key based on differences between the faulted and non-faulted/normal outputs.

In the lightweight crypto algorithm requirements[3], NIST states:

While implementations will not be required to protect against fault attacks, the ability to provide it easily and at low cost will be taken into consideration

Although DFA was probably considered in the design of Ascon, Ascon was not required to be secure in the presence of fault attacks.

To explain the attack we will be performing, lets look at how Ascon works. Ascon encryption is performed in four steps[4]:

The final step is the one that stood out to me. Injecting the key means xoring the key with the state. Specifically, the tag is exactly calculated as K ^ {state[3],state[4]}, where state is a 320-bit value arranged as 5 64-bit words. This is a juicy target for DFA because if we can inject a fault that can skip the xor operation, the key can be calculated as K=tag_normal ^ tag_faulted. Simple enough :)

I googled around to find out if anyone had published anything about DFA on Ascon before. The only attacks I saw were either impractical[5] or statistical (e.g. SIFA)[6][7]. From my experience, statistical attacks like SIFA are much less useful to a real-world attacker than classical DFA since SIFA requires significant control over the results of the glitch. I expect more techniques and implementation-specific attacks to show up over the next few years, but for now this is all I was able to find. I haven't seen any mentions of the basic DFA attack described in this post. If I missed any other prior work, please let me know!

Update: I found another similar attack[11]. This is a more complicated attack which relies on ineffective faults and biasing the s-box. That said, the authors point out that the key addition in the finalization stage is a weakness, and they describe methods to distinguish faulty tags, which is helpful for this attack.

First Proof-of-Concept

I decided to use the SAM4S target board that came with my ChipWhisperer Husky. This is a 32-bit ARM Cortex M4. I hacked together a SimpleSerial application with the pure C Ascon reference implementation[9]. I also added a trigger just before the xor operations, since my initial goal was a quick and dirty proof of concept. Since this is a 32-bit implementation, I expect to require 4 unique glitches to recover the full 128-bit key. I kept all inputs constant and performed a quick brute force of parameters. As this is a characterization experiment, I classified results by xoring the known-good tag with the corrupted tag, and comparing the results to the known key.

I performed a scan of parameters, and was able to recover they key.

$ python3 swascon_dfa_constn.py 
Connected...
(('corrupt', 5, 100, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m<\x0e\x00 \xbf\x0c@\x000\x00\x00\x00\x10\x00\x00\x00'), b'\x06O\x01\xb1N\xecn\x83\xd9Ii\x84D\x13.\xc1')
(('corrupt', 5, 101, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m<\x0e\x00 \xbf\x0c@\x000\x00\x00\x00\x10\x00\x00\x00'), b'\x06O\x01\xb1N\xecn\x83\xd9Ii\x84D\x13.\xc1')
(('key02', 5, 103, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf5\xe5(\x84\xe9Ii\x84T\x13.\xc1'), b'\x00\x00\x00\x00\x04\x05\x06\x07\x00\x00\x00\x00\x00\x00\x00\x00')
(('key01', 5, 104, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:@\x03\x92\xf1\xe0.\x83\xe9Ii\x84T\x13.\xc1'), b'\x00\x01\x02\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
(('corrupt', 5, 106, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84b_!_'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x006L\x0f\x9e')
(('corrupt', 5, 107, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84\x00\x00\x00\x00'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00T\x13.\xc1')
(('corrupt', 5, 108, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83P\x17*\xc5T\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\xb9^CA\x00\x00\x00\x00')
(('corrupt', 5, 112, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\x00\x00\x00\x00T\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\xe9Ii\x84\x00\x00\x00\x00')
(('key04', 5, 117, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84X\x1e \xce'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\r\x0e\x0f')
(('key03', 5, 118, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe1@c\x8fT\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x08\t\n\x0b\x00\x00\x00\x00')
(('corrupt', 5, 120, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xe9Ii\x84\xe9Ii\x84\x00\x00\x00\x00'), b'\x00\x00\x00\x00\x18\xa9G\x07\x00\x00\x00\x00T\x13.\xc1')

The output is a bit messy, but the important parts are the 2nd and 3rd fields. The glitch length was 5 cycles, and the delay for successful results was 103, 104, 117, or 118 cycles.

=> You can download the script here.

Double Glitch

That was a good proof of concept, but the Ascon V1.2 paper[10] notes that:

In order to fulfill the security claims[...], implementations must ensure that the nonce (public message number) is never repeated for two encryptions under the same key, and that decrypted plaintexts are only released after successful verification of the final tag

Thus, the implementaton where an attacker can control the nonce is already insecure. In fact, a practical adaptive chosen plaintext attack based on nonce misuse has already been published[8].

Although there is no specific recommendation for nonce generation, the specification notes that:

Except for the single-use requirement, there are no constraints on the choice of the nonce (public message number); in particular, it is possible to use a simple counter.

Based on this, I modified my implementation to use a global counter approach. A common approach in embedded devices is to avoid having to persistently store (and secure) the nonce across reboots by randomizing part of it, and using the other part as a counter. In my implementation, the first 64 bits will be randomized and the remaining 64 bits will be a counter which is initialized to 0 when the device boots. In a real device, you might see something like:

device_init(){
    ...
    uint64_t nonce[2];
    nonce[0]=64bit_trng();
    nonce[1]=0;
    ...
}

do_encryption(...){
    do_ascon_aead(nonce,...);
    nonce[1]++;
}

This is sufficient to prevent the cryptanalytic nonce-misuse attack in the real world.

In order to recover a single word of the key, we need the same nonce to be reused one time. So, our attack flow will look like:

  1. Perform one encryption with a fault

  1. glitch the counter increment operation to prevent it from updating (or, depending on the fault effects, fault it to a specific value such as 0)

  1. Perform a second encryption with the same inputs

  1. xor the tag values together to recover a word of the key

This requires two glitches, which means we have a lower chance of success per attempt. Luckily, the SAM4S was very easy to glitch, with a success rate of ~90% for the single-glitch attack. Based on that, we can probably expect a >80% success rate with the double-glitch attack.

I first tried to perform a single glitch to fault the increment so I could find the timing correct and... It didn't work. Instead of skipping the increment, I was reliably incrementing by 2. This makes me suspect that the actual fault model that I'm triggering here (and possibly in the previous test) is causing instructions to be re-executed (perhaps by corrupting the program counter update). Although this is a fault model that I have seen before, my experience is that it is usually rare and comes alongside many of the other more common fault types mentioned above.

I worked around this by refactoring my code slightly. By forcing the system to load an increment value from memory, then adding that to the nonce counter instead of performing a addition with an immediate value, I found that I could reliably skip the nonce increment. I'm most likely faulting the load operation rather than the increment itself, but at the end of the day the outcome is the same. I identified the required timing and was able to try the double-glitch attack.

After implementing the double-glitch, I was able to successfully extract the key very quickly and reliably.

$ python3 swascon_dfa.py 
Connected...
00010203000000000000000000000000
00010203040506070000000000000000
000102030405060708090a0b00000000
000102030405060708090a0b0c0d0e0f
Key recovered: 000102030405060708090a0b0c0d0e0f

=> You can download the double-glitch script here.

But what about decryption?

The tag calculation makes targeting encryption very easy and fairly obvious why it works. With decryption, its not as immediately clear that DFA is even possible. After staring at it, the cipher looks like it essentially constructs a keystream out of hash operations. I don't think we can invert the hash operations, and even if we could, we probably wouldn't easily be able to get the full key out without a lot of extra work. A minimum of 2 glitches would be required if the tag value is checked before returning the plaintext, but the best attack that I might be able to perform on a software implementation would require two triple-glitches.

One attack would look like:

This attack is probably not applicable to hardware implementations, and probably not feasible in most real-world software implementations. That said, I will probably attempt something like this attack at some point since the SAM4S was so susceptible to double-glitches.

If anyone has more ideas for performing this attack with fewer glitches, I'd love to hear them. A practical DFA attack on decryption would probably be more interesting than this encryption attack, since I find attacks on decryption (eg firmware loading) to be more common in the real world.

Update: Single-Fault Method

This attack assumes the following:

Under these assumptions, the attack works as follows:

This attack replaces the fault on the nonce increment with a brute force step.

The total complexity of the attack is 128/wordsize faults and (128/wordsize) * 2^wordsize decryptions. On my target, the 4*2^32 brute force will be pretty impractical since the serial interface is 38400 baud. Other 32-bit implementations might make this attack practical. and 8- or 16-bit implementations should be very attackable with this method. 64-bit implementations probably won't be vulnerable to this method in practice.

This attack was partially inspired by [11], which mentioned using decryption as an oracle for correct tags.

Files

=> double-glitch script
=> single-glitch script
=> firwmare

References

=> [1] https://csrc.nist.gov/News/2023/lightweight-cryptography-nist-selects-ascon
=> [2] https://www.crowdsupply.com/newae/chipwhisperer-husky
=> [3] https://csrc.nist.gov/CSRC/media/Projects/Lightweight-Cryptography/documents/final-lwc-submission-requirements-august2018.pdf
=> [4] https://ascon.iaik.tugraz.at/specification.html
=> [5] https://eprint.iacr.org/2019/1370.pdf
=> [6] https://doi.org/10.1109/HST.2019.8741029
=> [7] https://doi.org/10.1007/978-3-030-16350-1_5
=> [8] https://csrc.nist.gov/csrc/media/Events/2022/lightweight-cryptography-workshop-2022/documents/papers/practical-cube-attack-against-nonce-isused-ascon.pdf
=> [9] https://github.com/ascon/ascon-c/tree/main/crypto_aead/ascon128v12/ref
=> [10] https://ascon.iaik.tugraz.at/files/asconv12-nist.pdf
=> [11] https://csrc.nist.gov/CSRC/media/Events/lightweight-cryptography-workshop-2020/documents/papers/active-passive-recovery-attacks-ascon-lwc2020.pdf

Proxy Information
Original URL
gemini://gem.hacking.rip/ascon.gmi
Status Code
Success (20)
Meta
text/gemini;lang=en-US
Capsule Response Time
353.161696 milliseconds
Gemini-to-HTML Time
2.542799 milliseconds

This content has been proxied by September (ba2dc).