Normal view

There are new articles available, click to refresh the page.
Before yesterdayZeus WPI

Belgian Cyber Security Challenge 2024: Additional problems

9 March 2024 at 00:00

Title: Additional Problems

Category: cryptography

Points: 500

Solves: 2

Description:

Version 1.0 of our new encryption service has just launched! It is blazingly fast and uses state-of-the-art encryption. Stay tuned for version 2.0; I hear it will bring tons of improvements and security fixes.

Download challenge files

Access the server via nc IP_ADDRESS PORT (the server is now down, but you can run it yourself)

Introduction

This challenge proved to be among the hardest challenges at CSCBE, with only one team submitting a flag. The teams at Zeus WPI ended up in 2nd, 9th (virtual, since winners are not allowed to participate again), 52nd and 83rd and 104th place at the online qualifiers.

Description

The challenge provides a Python file, which runs a server accessible through a TCP socket. The server implements a form of homomorphic encryption: with homomorphic encryption, addition of two ciphertexts results in a ciphertext that decrypts to the addition of the two plaintexts. Here, the encryption was implemented with a secret 128-bit prime p, and an N between 128 and 255. The cryptosystem works as follows:

#!python
def dghv_encrypt(p, N, m):
    """
    Encrypt a value to later decrypt with `dghv_decrypt`
    """
    assert 2**7 <= N < 2**8 # Normally this is 2, but by using a bigger `N` we can encode ASCII bytes instead of bits! That's much more efficient. All `N` in this range should be secure, so let's make it an assertion

    q = reduce(lambda x, y: x*y, [Crypto.Util.number.getPrime(128) for _ in range(8)]) # `q` can be any number, but as we all know, big primes are the safest numbers there are
    rmax = 2**128 / N / 4
    r = random.randint(0, rmax) # In v2.0, we will let `r` be negative as well as positive => double the randomness!
    return p*q + N*r + m

def dghv_decrypt(p, N, c):
    """
    Since c = pq + Nr + m, we can find m as (c mod p) mod N!
    """
    return (c % p) % N

def dghv_add(c1, c2):
    """
    The sum of ciphertexts decodes to the sum of plaintexts!!!
    """
    return c1 + c2 # We will add bootstrapping to make this fully homomorphic in v2.0

Encryption, decryption and adding works on byte-based granularity: every byte is encrypted separately.

When first connecting, the server generates a new p, encrypts the flag with it and prints the encrypted flag. It then asks for an N value and a (hexadecimal) plaintext that will be used in your first session.

It then provides a menu where you have three options:

  • Start a new session, where you need to give an N value and a plaintext; it will use dghv_encrypt to set the ciphertext
  • Add an additional plaintext to the current session: it will encrypt that plaintext with dghv_encrypt, then use dghv_add to add it to the current ciphertext
  • Decrypt the ciphertext: it returns the result of dghv_decrypt on the current ciphertext

Note that other than the initial ciphertext printed at startup, we never have access to any ciphertexts.

Vulnerability

If you add too many plaintexts, the N*r terms become greater than p, making the decryption step (c % p) % N not work correctly anymore. Suppose you always send zero-filled plaintext, and it eventually decrypts to a value x != 0, then we have:


(p*q + N*r + (m_0 + m_1 + m_... + m_n)) % p % N == x

because all message bytes m_n are 0, we have

(p*q + N*r) % p % N == x

now, let N*r > p, but smaller than 2*p (we can be sure of this,
since r only increases with maximum 2**128 / N / 4 per encryption
and p is a 128 bit prime), then we have

N*r = p + a.

By substituting N*r for p + a

(p*q + p + a) % p % N == x

which is then equal to

a % p % N == x

and thus, since a < p,

a % N = x

and since 0 < x < N,

p % N == N - x.

We can then iterate through all prime values of N between 128 and 255 to get each value of p % N_i; once we have these values, we can use the Chinese Remainder Theorem to calculate p (since the product of all those primes is bigger than 128 bits, p is uniquely determined).

Solution script

This script was written in the heat of the moment, so it’s not the cleanest, but it works. It uses the excellent pwntools library.

#!python
from pwn import remote

conn = remote('additional_problems.challenges.cybersecuritychallenge.be', 1340)

def dghv_decrypt(p, N, c):
    """
    Since c = pq + Nr + m, we can find m as (c mod p) mod N!
    """
    return (c % p) % N


def add(conn, msg):
    print('add')
    conn.sendline(b'2')
    conn.sendline(msg)
    conn.recvuntil(b'> ')

def get(conn):
    print('get')
    conn.sendline(b'3')
    lines = conn.recvuntil(b'> ')
    hexed = [int(a, 16) for a in lines.decode().split('\n')[0].split(': ')[1].split(' ')]
    return hexed

def new(conn, N, msg):
    print('new')
    conn.sendline(b'1')
    conn.recvuntil(b'Choose N: ')
    conn.sendline(str(N).encode())
    conn.recvuntil(b'Message to encode (converted to hexadecimal): ')
    conn.sendline(msg)
    conn.recvuntil(b'> ')

conn.recvuntil(b"We've even encrypted our secret flag with it:\n")

encrypted = conn.recvuntil(b'\n\n').decode().strip()

# only in modified testserver
# p, encrypted = encrypted.split('\n', maxsplit=1)
# p = int(p.strip())

ciphertexts = [int(e.strip()) for e in encrypted.split('\n')]

# skip initial setup
conn.sendline(b'128')
conn.recvuntil(b'Message to encode (converted to hexadecimal): ')
conn.sendline((' '.join('00' for _ in range(30))).encode())
conn.recvuntil(b'> ')

# real shit
all_zeroes = (' '.join('00' for _ in range(10))).encode()

primes = [131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251]

mapped = {}


from functools import reduce
def chinese_remainder(m, a):
    sum = 0
    prod = reduce(lambda acc, b: acc*b, m)
    for n_i, a_i in zip(m, a):
        p = prod // n_i
        sum += a_i * mul_inv(p, n_i) * p
    return sum % prod
 
def mul_inv(a, b):
    b0 = b
    x0, x1 = 0, 1
    if b == 1: return 1
    while a > 1:
        q = a // b
        a, b = b, a%b
        x0, x1 = x1 - q * x0, x0
    if x1 < 0: x1 += b0
    return x1

for prime_idx, prime in enumerate(primes):
    new(conn, prime, all_zeroes)
    for i in range(32):
        add(conn, all_zeroes)
        received = get(conn)
        reduced = [d for d in received if d != 0]
        if reduced:
            mapped[prime] = prime - reduced[0]
            # only in modified testserver
            # assert (p % prime == mapped[prime])
            # print("OK!")
            break
        print(f'{prime_idx} of {len(primes)} {i=}')


moduli = [mapped[prime] for prime in primes]
recovered = chinese_remainder(primes, moduli)

print('FLAG=')
for cipher in ciphertexts:
    print(chr(dghv_decrypt(recovered, 128, cipher)), end='')

print('')

Unveiling secrets of the ESP32 part 2: reverse engineering RX

7 December 2023 at 00:00

This is the second article in a series about reverse engineering the ESP32 Wi-Fi networking stack, with the goal of building our own open-source MAC layer. In the previous article in this series, we built static and dynamic analysis tools for reverse engineering. We also started reverse engineering the transmit path of sending packets, and concluded with a rough roadmap and a call for contributors.

In this part, we’ll continue reverse engineering, starting with the ‘receiving packets’ functionality: last time, we succesfully transmitted packets. The goal of this part is to have both transmitting and receiving working. To prove that our setup is working, we’ll try to connect to an access point and send some UDP packets to a computer also connected to the network.

Receive functionality

As a short recap, the transmit functionality worked by:

  1. Putting the packet you want to transmit in memory
  2. Create a DMA (direct memory access) struct. This struct contains:
    • the address of the packet you want to transmit
    • the length and size of the packet (I haven’t entirely figured out the difference, but one always seems to be 32 bigger than the other one)
    • the address of the next packet (we set this to NULL to transmit a single packet)
  3. Write some other memory peripherals to configure the settings for the packet you’re about to transmit
  4. Write the address of the DMA struct to a memory mapped IO address
  5. The hardware then automatically reads the DMA struct, and transmits the packet
  6. After this is done, interrupt 0 will fire, telling us how succesful the transmission was

The receive functionality seems to use the same DMA struct, but in a slightly different way:

  1. Set up a linked list of DMA structs, where the next field of the struct points to the next DMA struct in the linked list. The final DMA struct points to NULL. Every address field points to a buffer, and the length and size fields are set to the size of the buffer.
  2. Write the address of the first DMA struct to a memory mapped IO address (WIFI_BASE_RX_DSCR). Now the setup is done, and we can receive packets.
  3. When a packet is received by the hardware, it will put the packet into the address of the first available DMA struct. The length field will indicate the length of the packet; the size field will not be updated. The has_data field will be set to 1.
  4. Interrupt 0 will fire to notify the processor that a packet was received. This interrupt will notify a non-interrupt task that a packet was received. We should avoid to do much processing in the interrupt, since we want to return as quickly as possible.
  5. Outside of the interrupt, we can then look at the linked list of DMA structs to see which ones have their has_data bit set. The address buffers can then be passed up further in the Wi-Fi MAC stack. We want to avoid running out of DMA structs to receive packets into, so we have to extend the linked list. We could do it by just allocating a new DMA struct and space for a packet and putting it at the end of the DMA linked list, but this constant allocating and deallocating would be rather inefficient. Instead, we recycle existing DMA structs by resetting their fields and inserting them at the end of the linked list.

Practicalities

Now we have a basic way to receive packets, but when we implemented this, no packets were received: this was likely because of the hardware MAC address filters: if you are a Wi-Fi device, there are a lot of packets flying in the air that you’re not interested in. For example, if you’re a station (for example, a phone) and are connected to an access point, you don’t really care about the packets other access points are sending to their stations. To avoid the overhead in also having to process ‘uninteresting’ packets, most Wi-Fi devices have a hardware filter where you can set the MAC addresses of packets you want to receive. The hardware will then filter out the packets with different MAC addresses, and will only forward packets with matching MAC addresses to the software.

The ESP32 also seems to have this implemented, but luckily for us, the ESP32 also implements a sort of monitor mode (also known as promiscuous mode), where every packet that is receieved by the hardware is passed to the software. The ESP32 SDK has a call esp_wifi_set_promiscuous(bool) where you can enable or disable this feature. When we enabled this, we did start to receive packets. We’ll eventually reverse engineer and implement hardware MAC address filtering as well, but for now, we’ll just filter in software.

Connecting to an access point

Now that we have send and receive working, you’d think that we’d be able to connect to an access point and start sending packets, right? Well, not entirely: since this is such a big project, we only implemented the bare minimum to proceed in every phase. This is the same approach Ladybird takes to build a novel browser:

If you tried to build a browser one spec at a time, or even one feature at a time, you’d most likely run out of steam and lose interest altogether. So instead of that, we tend to focus on building “vertical slices” of functionality. This means setting practical, cross-cutting goals, such as “let’s get twitter.com/awesomekling to load”, “let’s get login working on discord.com”, and other similar objectives.

This approach is very motivating, but sometimes bites you in the ass when you have to figure out why something is not working.

Step 1: using Scapy

Before we start with the undertaking of connecting the ESP32 to an access point, we’ll first start by implementing connecting a regular USB Wi-Fi dongle to an access point by constructing and sending the packets ourselves to make sure we understand everything that’s needed; and so we’ll have a known-working reference implementation. We found this blog post about using Scapy, a Python packet manipulator library, for connecting to an open access point. We need 4 packets to set up the connection:

  1. Authentication, from client to AP
  2. Authentication, from AP to client
  3. Association request, from client to AP
  4. Association response, from AP to client

After that, if everything has gone well, we can send data frames from the client to the access point and they’ll get accepted. We extended the blog post code a bit to also send data frames at the end of the connection setup, and verified that everything was working. For the data frames, we used UDP packets, because we can just construct the packet once, and then keep sending it; UDP is stateless, unlike TCP.

Step 2: using the ESP32

We implemented this on the ESP32, by copying the packets from Scapy and hardcoding the packet contents in the C source code. To make sure we could discern the ESP32 from the scapy implementation, we replace the MAC address of the adapter we use for testing with an arbitrary MAC address (01:23:45:67:89:ab). When we then sent the packets, we saw that we received an ACK frame in response to our authentication, but we didn’t receive an authentication answer back from the AP. Even stranger, the ACK was towards a different MAC address: 00:23:45:67:89:ab.

Apparently, MAC addresses aren’t just 6 arbitrary bytes with the first 3 bytes being vendor specific: the last bit of the first byte indicates if the packet is unicast or multicast. By using the 01:... MAC address, we had sent multicast packets instead of unicast packets.

After fixing this by using a different MAC address, we started to receive frames back from the access point. Because we didn’t implement sending ACKs back, we received every frame from the access point 4 times: since the access point didn’t receive any ACKs back, it would assume the packet was not received correctly. At that point, that wasn’t a problem: the AP would happily proceed with association request and response.

However, when we started to send data packets, we’d immediately started to receive disassociation frames from the AP as a reply to our data packets. The only difference between the (working) Scapy implementation and the current ESP32 implementation, was not sending ACKs back; so I guess implementing that is necessary after all.

Sending ACK frames back in software is not as easy as it seems though: the ACK frame needs to be sent exactly one SIFS (Short Interframe Space) time period after the last symbol of the received frame. For 802.11b, such a SIFS is only 10 microseconds; the round-trip-time through the hardware and software is already more than 10 us, so we can’t implement this in software. The proprietary network stack does send ACK frames back, so this must be implemented somehow. And indeed, sending ACKs is implemented in hardware: by writing to a memory-mapped IO address, you can configure a MAC address for which the hardware will automatically send back an ACK.

After also implementing this, we received our first packets on the computer that had netcat listening for UDP packets 🎉

First succesfully received data packets sent by ESP32
First succesfully received data packets sent by ESP32

Since we now implement the interrupt ourselves, we can send and receive frames, without any proprietary code running (proprietary code is still used to initialize the hardware in the begin, but is not needed anymore after that).

The current way of hardcoding the contents of packets was appropriate for the proof-of-concept showing that we can connect to an AP and send packets, but is not useable for our eventual goal. We’re searching for an open source implementation that handles the higher level functionality of the 802.11 MAC layer (constructing and parsing packets, knowing what packets to send when, …). For the higher layers, we can use the existing lwIP TCP/IP-stack on the ESP32.

All code is available on the esp32-open-mac GitHub organisation.

Roadmap

  • ☑ Send packets
  • ☑ Receive packets
  • ☑ Send ACK (acknowledgment) packets back if we receive a packet that is destined for us
  • ☑ Implement hardware filtering based on MAC address so we don’t receive as much packets
  • ☐ Find or build an open source 802.11 MAC implementation to construct the packets we want to send. The Linux kernel has mac80211, but including the full Linux kernel does not seem to be feasible. This is not ESP32-specific; we’d ideally find an implemenation where you can pass your own TX and RX functions, and they do the rest.
  • ☐ Implement changing the wifi channel, rate, transmit power, …
  • ☐ Implement the hardware initialization (now done by esp_phy_enable()). This will be a hard undertaking, since all calibration routines will need to be implemented, but also has a high payoff: we’ll then have a completely blob-free firmware for the ESP32.
  • ☐ Write SVD documentation for all reverse engineered registers. An SVD file is an XML file that describes the hardware features of a microcontroller, this makes it possible to automatically generate an API from the hardware description. Espressif already has an SVD file containing the documented hardware registers; we can document the undocumented registers and (automatically) merge them in.

The two hardest (but most important) tasks are implementing hardware initialization, and connecting our sending and receiving primitives to an open source 802.11 MAC stack.

Bonus: Charlotte breaking everything

Charlotte playing music completely broke the setup: the music setup at our hackerspace works via RTP (Realtime Transport Protocol). Under the hood, RTP sends UDP packets containing the audio data to a multicast address; so these packets were also transmitted over the Wi-Fi. Because this was a lot of packets per second, the receive buffer was always full, and very few other packets could be received/ACKed. This made it clear that hardware filtering would need to be implemented sooner than later; reverse engineering turned out to be not as much work as expected.

The hardware filtering seems to have two ‘slots’, for every slot you can filter on a destination MAC address and on a BSSID (not sure if you can do both in each slot or you have to choose). By default, the hardware will not let any packets through. The hardware will only send an ACK frame back if the packet was let through via one of the filters and was copied into an RX DMA buffer: packets that were copied into an RX DMA buffer because of promiscuous mode will not result in an ACK frame getting sent.

Questions? Want to collaborate?

This is a sizeable project that could definitely use multiple contributors; I’d really like to collaborate with other people to create a fully functional, open-source Wi-Fi stack for the ESP32. If this sounds like something you’d like to work on, contact me via zeusblog@notdevreker.be, maybe we can have a weekly hacking session?

As far as I know, this is the first undertaking to build an open source 802.11 MAC for an affordable microcontroller. If you want to financially support this project, you can wire money via https://zeus.ugent.be/contact/#payment-info, please put “ESP32” in the transaction description, so our treasurer knows what the money is for. Please do not donate if you’re a student or if you’re not financially independent. If you’re a company and would like to donate hardware (for example, a faraday cage or measuring equipment that might be useful), please contact me.

This project was funded through the NGI0 Core Fund, a fund established by NLnet with financial support from the European Commission’s Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101092990.

Feel free to send me an email in case you have questions, you think something in this blog post could be worded better or you spotted a mistake.

Unveiling secrets of the ESP32: creating an open-source MAC Layer

6 December 2023 at 00:00

(This is part 1 of this series, part 2 is here)

The ESP32 is a popular microcontroller known in the maker community for its low price (~ €5) and useful features: it has a dual-core CPU, built-in Wi-Fi and Bluetooth connectivity and 520 KB of RAM. It is also used commercially, in devices ranging from smart CO₂-meters to industrial automation controllers. Most of the software development kit that is used to program for the ESP32 is open-source, except notably the wireless bits (Wi-Fi, Bluetooth, low-level RF functions): that functionality is distributed as precompiled libraries, that are then compiled into the firmware the developer writes.

A closed-source Wi-Fi implementation has several disadvantages compared to an open-source implementation though:

  • You are dependent on the vendor (Espressif in this case) to add features; if you have a somewhat non-standard usecase, you might be out of luck. For example, standards-compliant mesh networking (IEEE 802.11s) is not supported on the ESP32; there is a partially closed-source mesh networking implementation made by Espressif, but this is rather limited: the mesh network has a tree topology, and uses NAT on the nodes connected to the root network, making it hard to connect from outside the mesh network to nodes in the mesh network. The protocol is also not documented, so it’s not interoperable with other devices.
  • It’s hard to audit the security of the implementation: since there is no source code available, you have to resort to black-box fuzzing and reverse engineering to find security vulnerabilities.
  • Additionally, an open-source implementation would make research into low-power Wi-Fi mesh networking more affordable; if each node only costs about €5, research involving hundreds of nodes can be affordable on a modest budget.

Espressif has an open issue in their esp32-wifi-lib repository, asking to open-source the MAC layer. In that issue, they confirmed in 2016 that open sourcing the upper MAC is on their roadmap, but as of 2023, nothing has been published yet. Having the source code would for example allow us to implement proper 802.11s-compliant mesh networking.

Goals

The main goal of this project is to build a minimal replacement for Espressifs proprietary Wi-Fi binary blobs. We don’t intend to be API-compatible with existing code that uses the Espressif ESP-IDF API, rather, we’d like to have a fully working, open source networking stack.

The rest of this section will contain information about how the network stack and Wi-Fi (the 802.11 standard) works, so if you’re already familiar, you can skip it.

OSI model of the network stack (the difference between application/presentation/session is a bit murky)
OSI model of the network stack (the difference between application/presentation/session is a bit murky)

Above, you can see a diagram showing the network stack. Computer networking is done with a network stack, where every layer in the stack has its own purpose; this design makes it easier to swap out layers and allows for separate development of layers. The layer at the bottom of the stack interacts with the physical world (for example, by using radiowaves or electric signals); every layer adds their own features. Wi-Fi (also known as the 802.11 standard by engineers) is implemented in the bottom two layers: the PHY layer (what the radio waveforms look like, …) and the MAC layer (how we connect to an access point, what packets exist, how to send packets to local devices, …).

On the ESP32, the PHY layer is implemented in hardware; most of the MAC layer is implemented in the proprietary blob. One notable exception to this separation is sending acknowlegement frame: if a device receives a frame, it should send a packet back to acknowledge that this packet was received correctly. This ACK packet needs to be sent within ~10 microseconds; it would be hard to get this timing correct in software.

There are 3 types of MAC frames:

  • Management frames: mostly for managing the connection between the access point and station (client)
  • Control frames: help with delivery of other types of frames (for example ACK, but also request-to-send and clear-to-send)
  • Data frames: contain the data of the layers above the MAC layer

Previous work

Since it doesn’t look like Espressif will release an open source MAC implementation anytime soon, we’re on our own to create this. This is rather hard to do, because the hardware with which we send and receive 802.11 packets on the ESP32 is entirely undocumented. This means that we will need to reverse engineer the hardware; first we’ll need to document what the hardware does, then we’ll need to write our own code to correctly interact with it. In 2021, Uri Shaked did some very light reverse engineering of ESP32 Wi-Fi hardware, to mock this in his emulator. That way, programs for the ESP32 can be emulated instead of running them on real hardware. Shaked gave a talk about this, but only discussed very high level details about the hardware. Espressif has their own fork of QEMU (a popular, open-source emulator) that can also emulate the ESP32, but this fork does not support emulating the Wi-Fi hardware. In 2022, Martin Johnson added basic support for the Wi-Fi hardware to their own fork of Espressif’s QEMU. The emulated ESP32 can connect to a virtual access point, or have a virtual client connect to it.

esp-idf (the SDK for the ESP32) has a function to transmit frames (esp_wifi_80211_tx), but this function only accepts certain types of frames; it does not allow sending most management frames, severely limiting the usefulness of this API to base an 802.11 MAC stack on. They also have a function (esp_wifi_set_promiscuous_rx_cb) to receive a callback on reception of a frame.

Tools

Before we can start reverse engineering how the 802.11 PHY hardware works and how we interact with it, we first need to find or build tools that will help. We’ll use 3 main approaches:

  • Static reverse engineering: we have the compiled libraries that implement the Wi-Fi stack, so we can look at the compiled code and try to decompile it to human-readable code. From this more readable code, we then try to see what the hardware expects the software to do.
  • Dynamic code analysis in an emulator: we can run the firmware in an emulator and inspect how it interacts with the virtual hardware. This has the advantage of having a lot of freedom to how we inspect the hardware, but the disadvantage that the emulator might not behave the same as real hardware. Since we’ll need to write the emulated peripherals ourselves, this risk is real: there is no public datasheet for the Wi-Fi peripheral, so we have to guess how the hardware will behave from the code that interacts with it.
  • Dynamic code analysis on real hardware: we can run the firmware on an actual ESP32, and debug it using a JTAG debugger. This allows us to place breakpoints, inspect the memory and registers, stop and resume the execution, … The disadvantage is that the debugging capabilities are more limited compared to running in an emulator: we can only place 2 breakpoints, we cannot place watchpoints (breakpoints that trigger on memory reads/writes to a certain address), … The big advantage compared to using an emulator is that we’ll know for sure that the behaviour of the hardware is correct.

Static analysis

For the static analysis, we use Ghidra, an open-source reverse engineering tool made by the NSA. Out of the box, Ghidra does not have support yet for Xtensa (the CPU architecture of the ESP32), but there is a plugin that adds support. The build tools used in the ESP32 SDK generate both an ELF file (a type of binary file that can contain metadata) and a flat binary file: using the ELF file has the benefit of automatically setting most function names.

Dynamic analysis in emulator

We started off from Martin Johnsons’s fork of Espressifs version of QEMU (a popular open-source emulator), and ported their changes to the latest version of Espressif’s QEMU fork. The ESP32 talks to its peripherals via memory mapped IO: by reading from and writing to certain memory addresses, the peripherals provides information to the CPU and does things. To help in reverse engineering, we added log statements to the QEMU Wi-Fi peripherals that log every access to their memory ranges.

Additionally, we also implemented stack unwinding in QEMU; this is done for every memory access to a hardware peripheral related to Wi-Fi. That way, we can get a full stack trace for every peripheral access. Symbols are not stripped, so this is a very useful tool. However, to get stack unwinding properly working, we have to run QEMU in single step mode: QEMU has a JIT compiler that compiles sequences of emulated assembly instructions into optimized basic blocks. This greatly improves the execution speed, but since the CPU execution state is only guaranteed to be correct at the beginning of a basic block, if a peripheral memory access happens in the middle of such a basic block, the stack unwinding algorithm gives wrong results.

Running in single-step mode negates much of the benefit of the QEMU JIT compiler, causing the code to run much slower. This is not that big of a disadvantage, compared to the treasure trove of information the execution trace gives us.

Below is an example of a single memory access logged by QEMU: it’s a write (W) to address 3ff46094 with value 00010005, done by the function ram_pbus_force_test. The rest of the callstack is also logged, and translated to a symbol name if available.

W 3ff46094 00010005 ram_pbus_force_test 400044f4 set_rx_gain_cal_dc set_rx_gain_testchip_70 set_rx_gain_table bb_init register_chipv7_phy esp_phy_load_cal_and_init esp_phy_enable wifi_hw_start wifi_start_process ieee80211_ioctl_process ppTask vPortTaskWrapper

Finally, we also corrected the handling of MAC addresses (compared to Martin Johnsons version), so that a packet capture has correct MAC addresses in packets instead of hardcoded addresses.

Dynamic analysis on real hardware

To dynamically analyze the firmware on real hardware, we use the JTAG hardware debugging interface. By connecting some jumper wires between the ESP32 and a JTAG debugger, we can debug the ESP32. We followed the steps described in this GitHub repository to get our JTAG debugger (CJMCU-232H) working.

In additon to the JTAG debugger, we also connected a USB Wi-Fi dongle directly to the ESP32: the ESP32-WROOM-32U variant of the ESP32 has an antenna connector. We connect that antenna connector to a 60 dB attenuator (this weakens the signal by 60dB), then connect that to the antenna connector of the wireless dongle. That way we’ll be able to only receive the packets coming from the ESP32, and the ESP32 will only receive packets sent by the wireless dongle.

This idea unfortunately did not entirely work: enough radio waves from outside access points leaked into the antenna connector that the wireless dongle also received their packets. We tried to build a low-cost faraday cage from a paint can to prevent this, but this only attenuated outside signals with an extra 10dB: this removed some APs, but not all of them. The current solution is definitely not ideal, so we’ve started work on building a better and larger faraday cage, from conducting fabric and with fiber-optic data communication.

Wi-Fi dongle connected to the ESP32, with two 30 dB attenuators in between
Wi-Fi dongle connected to the ESP32, with two 30 dB attenuators in between
Faraday cage made from a paint tin, with copper tape to close the hole for the USB connectors, and ferrite chokes to reduce the RF leaking in
Faraday cage made from a paint tin, with copper tape to close the hole for the USB connectors, and ferrite chokes to reduce the RF leaking in

Architecture

SoftMAC vs HardMAC

SoftMAC (Software MAC) and HardMAC (Hardware MAC) refer to two different approaches for implementing the MAC layer for Wi-Fi. SoftMAC relies on software to manage MAC layer functions, which offers flexibility and ease of modification but can consume more power/CPU cycles. HardMAC, on the other hand, offloads MAC layer processing to dedicated hardware, reducing CPU usage and power consumption but limiting the ability to adapt to new features without hardware changes.

The ESP32 seems to use a SoftMAC approach: you can directly send and receive 802.11 frames (instead of with HardMAC, where you tell the hardware you want to connect to a certain AP, and it would then automatically craft the nescessary frames and send them). This is good news for our open source implementation, since there already exist open-source 802.11 MAC stacks for SoftMAC (for example, mac80211 in the Linux kernel).

Peripherals

The Wi-Fi functionality is implemented via multiple hardware peripherals, each responsible for a separate part of the functionality. Through reverse engineering, the following peripherals were identified as ‘used for Wi-Fi functionaliy’ (these are memory addresses, through which the peripherals can be accessed):

  • MAC peripherals, at 0x3ff73000 to 0x3ff73fff and at 0x3ff74000 to 0x3ff74fff
  • RX control registers, at 0x3ff5c000 to 0x3ff5cfff
  • baseband, at 0x3ff5d000 to 0x3ff5dfff
  • chipv7_phy (?) at 3ff71000 to 3ff71fff
  • chipv7_wdev (?) at 3ff75000 to 3ff75fff
  • RF frontend, at 3ff45000 to 3ff45fff and 3ff46000 to 3ff46fff
  • analog at 3ff4e000 to 3ff4efff (this is also used by the DAC connected to GPIO pins)

It should be noted that these peripherals are mirrored to another place in the address space:

Peripherals accessed by the CPU via 0x3FF40000 ~ 0x3FF7FFFF address space (DPORT address) can also be accessed via 0x60000000 ~ 0x6003FFFF (AHB address). (0x3FF40000 + n) address and (0x60000000 + n) address access the same content, where n = 0 ~ 0x3FFFF.

Lifecyle

By writing some minimal firmware that just sends packets in a loop and using the three reverse engineer strategies described earlier, a high level overview of the Wi-Fi hardware lifecycle for sending a packet was determined:

  1. Calling esp_wifi_start(), this indirectly calls esp_phy_enable()
  2. esp_phy_enable() is responsible for initializing the wifi hardware:
    1. Calibrate the PHY hardware: this tries to compensate imperfections of the hardware. According to the data sheet, this does, at least: I/Q phase matching; antenna matching; compensating carrier leakage, baseband nonlinearities, power amplifier nonlinearities and RF nonlinearities (I’m more of a software person than an electronic engineer, so I don’t exactly know what these terms mean). This calibration can be stored to the non-volatile storage and to memory. This is used so we don’t have to do a full calibration every time the ESP32 wakes up from modem sleep.
    2. Initialize the MAC peripherals: set RX MAC address filters, set the buffers where the packets will be received into, set the auto-ACKing policy, set the chips own MAC address.
    3. Set various physical radio properties (TX rate, frequency, TX power, …)
    4. Set up the power management timer: if packets are not sent often enough, the modem power save timer kicks in and de-initializes part of the Wi-Fi hardware to save power.
  3. Now, we’re ready to send a packet:
    1. Wake up some Wi-Fi peripherals from deep sleep and restore their calibration, if we need to
    2. Set some metadata, related to the packet (likely the rate and other PHY settings)
    3. Create a DMA entry, consisting of the length of the packet and the address of the buffer containing the MAC data. The MAC Frame Checksum is automatically calculated by the hardware. DMA stands for Direct Memory Access: that means that we just tell the hardware the address and length of where our packet is, and the hardware will then read that memory and transmit the packet, all on its own.
    4. Write the lowest bits of the DMA entry into a hardware register, then enable it for transmission by setting a bit in the bitmask of that register.
    5. Once the packet is sent, interrupt 0 will fire to notify us how succesful the transmission was. We can react to collisions and timeouts (and probably also to ACKs received?). We also have to clear the interrupt bit that indicates a packet was sent.

Implementing transmitting packets

As a (very limited) proof-of-concept, we wanted to send arbitrary 802.11 frames by directly using the memory mapped peripherals, so without using the SDK functions. As you can see in the lifecycle diagram above, before transmitting, we first need to initialize the wifi hardware. Unfortunately, this initialization is a lot more complex than sending packets: to intialize the hardware, about 50000 peripheral memory accesses are needed, compared to about 50 for transmitting a packet (including handling the interrupt). These are not exact numbers at all, but they give an idea about the complexity involved.

For the basic ‘transmitting packets’ proof-of-concept, we are currently still using the proprietary functions to initialize the wifi hardware. We encountered the issue that after initializing, the modem power save timer would kick in and de-initialize the wifi peripherals, preventing us from sending packets. To work around this, we send a single packet using the SDK and then immediately call the undocumented pm_disconnected_stop() function, which disables the modem power save mode timer. After this, we can send arbitrary packets by directly writing to the MAC peripheral addresses. For this PoC, we don’t need to replace the interrupt handler for wifi events: the existing, proprietary handler will handle the ‘packet was sent’ interrupt just fine.

The basic proof of concept works, we can transmit arbitrary packets by directly writing and reading from memory addresses!

Current roadmap

Now we can transmit packets, but we still have a lot of work ahead of us: this is the to-do list, in rough order of priorities

  • ☑ Send packets
  • ☐ Receive packets: to do this, we will need to do the following:
    • Set the RX policy (this filters packets based on MAC address) / enable promiscous mode to receive all packets
    • Set the memory address in which we want to receive the packet via DMA
    • Replace the wifi interrupt with our own interrupt; the code indicates that there might be some kind of wifi watchdog, we’ll need to figure out how to pet it.
  • ☐ Send ACK (acknowledgment) packets back if we receive a packet that is destined for us
  • ☐ Implement changing the wifi channel, rate, transmit power, …
  • ☐ Combine our implementation with an existing open source 802.11 MAC stack, so the ESP32 can associate with access points
  • ☐ Implement the hardware initialization (now done by esp_phy_enable()). This will be a hard undertaking, since all calibration routines will need to be implemented, but also has a high payoff: we’ll then have a completely blob-free firmware for the ESP32.

And a list of possible future extensions that are not yet on the roadmap, but are useful to do anyways:

  • ☐ Implement modem power saving: turning off the modem when not in use
  • ☐ AMSDU, AMPDU, HT40, QoS
  • ☐ Do the cryptography needed for WPA2 etc in hardware instead of in software
  • ☐ Bluetooth
  • ☐ Write SVD documentation for all reverse engineered registers. An SVD file is an XML file that describes the hardware features of a microcontroller, this makes it possible to automatically generate an API from the hardware description. Espressif already has an SVD file containing the documented hardware registers; we can document the undocumented registers and (automatically) merge them in.

Code

All code and documentation is available in the esp32-open-mac GitHub organisation. I think especially the QEMU fork can be useful for other reverse engineers because of the memory tracing feature.

Update

Since the beginning of writing this blog post, receiving packets was also implemented. To accomplish this, we needed to implement the Wi-Fi MAC interrupt handler and manage the RX DMA buffers. This means that we now can send and receive packets using only open source code: the hardware initialization is still done with proprietary code, but after this setup is done, only open source code is used to send and receive packets, no more proprietary code is executed. The second part is here

Questions? Want to collaborate?

This is a sizeable project that could definitely use multiple contributors; I’d really like to collaborate with other people to create a fully functional, open-source Wi-Fi stack for the ESP32. If this sounds like something you’d like to work on, contact me via zeusblog@notdevreker.be, maybe we can have a weekly hacking session?

As far as I know, this is the first undertaking to build an open source 802.11 MAC for an affordable microcontroller. If you want to financially support this project, you can wire money via https://zeus.ugent.be/contact/#payment-info, please put “ESP32” in the transaction description, so our treasurer knows what the money is for. Please do not donate if you’re a student or if you’re not financially independent. If you’re a company and would like to donate hardware (for example, a faraday cage or measuring equipment that might be useful), please contact me.

This project was funded through the NGI0 Core Fund, a fund established by NLnet with financial support from the European Commission’s Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101092990.

Feel free to send me an email in case you have questions, you think something in this blog post could be worded better or you spotted a mistake.

Bestuur '23-'24

14 May 2023 at 00:00

Afgelopen dinsdag kwamen we traditiegetrouw samen in de studententuin om ons huidige bestuur te bedanken voor hun inzet en voornamelijk, om een nieuw bestuur te verkiezen voor het komende academiejaar.

De verkiezingen waren spannend omdat er voor bijna alle functies meerdere kandidaten waren. Uiteindelijk is iedereen die dinsdag opkwam, verkozen en dus stellen we met trots ons nieuwe bestuur voor dat Zeus WPI volgend jaar zal vertegenwoordigen:

Rol Naam
Voorzitter Jan Lecoutere
Vice-voorzitter Tibo Ulens
Penningmeester Vincent Vallaeys
Sysadmins Laurenz Verherbrugghen
Xander Bil
Francis Klinck
PR Tybo Verslype
Event Rune Dyselinck

We wensen de verkozen kandidaten veel succes, en kijken vol verwachting uit naar de plannen en initiatieven van het nieuwe bestuur voor het komende jaar.

Grote waardering gaat uit naar het afgelopen bestuur dat het afgelopen jaar uitstekend werk heeft verricht. We willen graag ook een speciale shout-out geven aan Jasper Devreker, die maar liefst vijf jaar als bestuurslid zijn toewijding aan Zeus heeft getoond. Binnenkort™ wordt er een overzicht van de gebeurtenissen van het afgelopen jaar gepubliceerd op onze website.

Reverse engineering an e-ink display

4 February 2023 at 00:00

One of our members managed to score some e-ink displays from eBay. These displays are used in shops, where they indicate the price of the items that can be bought. This has two key advantages over regular paper price tags: the text on the e-ink displays can be updated automatically and it’s possible to do fancier graphics. e-ink has an important distinction from the more common LCD screens: they only use power when they change their content. This enables the tags to have a small capacity battery and still operate for several years without battery replacement.

This tag, but then only in black and white, no red
This tag, but then only in black and white, no red

The person who bought the pricetags wanted to use them in a project, but didn’t find any documentation on how to communicate with them to display things on the screen. They donated three to Zeus with the challenge to get communication working and to draw something on the screen. This is the perfect number of devices according to bunnie’s book ‘The Hardware Hacker’ 1:

The biggest barrier to hacking is often the fear that you’ll break something while poking around. But you have to break eggs to make an omelet; likewise, you have to be willing to sacrifice devices to hack a system. Fortunately, acquiring multiple copies of a mass-produced piece of hardware is easy. I often do a bit of dumpster diving or check classified advertisements to get sample units for research purposes. I generally try to start with three copies: one to tear apart and never put back together, one to probe, and one to keep relatively pristine.

After gently prying apart the case, we got a closer look at the printed circuit board. This PCB had the e-ink screen and a battery connected to it. The battery was immediately disconnected before further work was done (as a safety precaution, to not accidentally short something). Pictures were then taken of the front and back of the PCB, see picture below for an explanation of the components.

Front of the PCB
Front of the PCB
Back of the PCB
Back of the PCB

The product is an SES-imagotag G1 2.7 BW NFC, with product code B27N02003. On the PCB, there is some text: RFRTx002D and KIM 1514 (most likely internal part numbers), along with (most likely) a datecode: 09 17 (so September 2017). The PCB itself looks very professionally designed and made, with plenty of testpoints and programmer pads. Attached to the screen, there is an NFC tag which contains the ID of the board.

The board contains a CC2510 microcontroller. This microcontroller can communicate over 2.4GHz, and it’s clear that the product uses this feature: there is an antenna structure present on the PCB. It’s very likely that the image updates are done wirelessly via that antenna. The only remaining part is then to figure what the communication protocol looks like. Unfortunately, we don’t have the accompanying device that puts images on the screen, so intercepting the wireless communications wouldn’t be possible. What would be possible however, is reading the code from the microcontroller and seeing what it expects.

The pictures of the PCB were transformed and overlaid in GIMP. The front and back layers of the PCB were put in different GIMP layers, so that it would be easy to switch between different views without losing your frame of reference. When routing signals between places of the board, PCB designers sometimes have to use so-called ‘vias’ to move from the front of the board to the back of the board. From the marking ‘1 TOP’ and ‘4 BOT’ on the PCB, it became clear that this is 4-layer PCB: there are two visible layers, and two hidden layers sandwiched between them. Luckily, someone else used acid and sandpaper to make the hidden layers visible. This was however on a slightly different revision, but most components are in roughly the same position.

By reading the datasheet of the CC2510, it became clear that there is a debug interface. This interface can be used to put code on the board, to step through the code when debugging or even to read out the code. The debug interface consists of two signal pins: a host-to-microcontroller clock pin, and a bi-directional data pin. To enable the debug interface, the clock needs to be pulsed twice while the reset pin is held low. This makes for a total of 5 wires that need to be attached to the PCB: debug clock, debug data, reset; and then ground and 3.3V to power the board.

PCB mounted on 3D-printed holder, debug pins attached to Pi Pico. The battery is in the background, disconnected
PCB mounted on 3D-printed holder, debug pins attached to Pi Pico. The battery is in the background, disconnected

The CC2510 microcontroller contains an 8051 processor core. 8051 is a rather old 8-bit instruction set, originally made by Intel, but it is used in a lot of embedded products. The debug interface is quite ingenious in how it implements most features: instead of individually imlementing write, read, verify and other features, it has the DEBUG_INSTR instruction. This debug instruction takes one to three bytes of arguments, and executes these as an 8051 instruction. After that, it sends the value of the accumulator (ACC/A) register back over the debug interface. Reading out the memory then is a loop of setting an address we want to read out and moving the value at that address into the ACC register.

Table 45: Debug commands
Table 45: Debug commands

We attached the pins of the CC2510 to a Raspberry Pi Pico microcontroller and wrote a small application, based on ESP_CC_Flasher, to interface with debug interface of the microcontroller. Something unfortunate very quickly became clear: the chip was debug-locked: after programming the firmware on the microcontroller, the manufacturer disabled the debug interface. This means that only a very limited subset of debug instructions were enabled: only READ_STATUS (with which we can see, among others, if the chip is debug locked), GET_CHIP_ID (with which we can get the type of chip) and CHIP_ERASE (which erases the debug protection bits, but also all the other code). No instructions with which we can read out the board were found. Our new goal is thus to bypass the code read-out protection.

An initial idea was to issue the CHIP_ERASE command, and then immediately power off the board, hoping that the debug lock bits would be reset before the entire flash was wiped. This unfortunately didn’t work; I suspect that after issuing the CHIP_ERASE command, a bit gets written to flash indicating that an erase was requested, and that the startup sequence on the microcontroller checks this bit and then wipes the firmware until that bit is clear. I was a bit disappointed that I had wiped the board and thus bricked it, until I remembered the advise of bunnie to have a device you’ll never get working again. This wiped device later still came in useful as a development board to test other exploits against: I didn’t have to fear accidentally wiping or breaking the device, since it was already wiped.

After this initial setback, we tried another technique, namely voltage glitching. This is an exploit technique where you, for a very brief amount of time, change the voltage of the chip. This sometimes causes unexpected behaviour such as skipping certain instructions in a program or loading a different value from memory. The hope was that by voltage glitching at just the right time, we could bypass the debug lock and still execute a debug instruction that was disabled. To develop such an exploit, it’s critical to thoroughly read the datasheet of the chip, since this sometimes contains hints about how to proceed. In this case, several interesting parts were found:

Note that after the Debug Lock bit has changed due to a Flash Information Page write or a flash mass erase, a HALT, RESUME, DEBUG_INSTR, STEP_INSTR, or STEP_REPLACE command must be executed so that the Debug Lock value returned by READ_STATUS shows the updated Debug Lock value. For example a dummy NOP DEBUG_INSTR command could be executed. The Debug Lock bit will also be updated after a device reset so an alternative is to reset the chip and reenter debug mode.

This means that for every ‘interesting’ debug instruction, the chip probably first fetches the debug lock bit from flash and then checks if the command is allowed. For READ_STATUS, the value of the last debug instruction will be used. This is very useful for us: if we glitch a debug instruction, we’ll be able to see if we succeeded by issueing a READ_STATUS command. If it displays that the chip is unlocked, we’ll know that the previous instruction executed succesfully. This unfortunatly also means that for every debug instruction issued, we’ll have to succesfully glitch the board. This makes an exploit harder, because a voltage glitch has a high risk of rebooting the microcontroller, resetting our progress in the glitch. This means we’ll have to focus on reading out data with as few consecutive instructions that have to succeed as possible.

Pinout of the CC2510
Pinout of the CC2510

After inspecting the pinout, one pin, namely the DCOUPL pin, stood out:

DCOUPL: Power decoupling: 1.8 V digital power supply decoupling

The chip itself runs on 3.3V and no external pins have 1.8V logic level, so at first glance, it would be a bit strange for the chip to have a 1.8V power supply. However, this is actually rather common: the internal logic in the microcontroller (the CPU, RAM, flash, …) very likely uses 1.8V. The DCOUPL pin is meant to be attached to an external capacitor to smooth out the internal 1.8V power supply. For us, this is handy, because now we have a direct connection to the internal power supply.

To glich the microcontroller, we do some micro-surgery: we remove the decoupling capacitor and attach a fast MOSFET (a digital switch) to the DCOUPL and ground pin. When the MOSFET is enabled, the 1.8V power supply is shorted to ground, and the chip glitches. Succesfully mounting a voltage glitching attack then becomes a matter of correctly timing the glitch and closing the MOSFET for the right amount of time. Those timings need to be very precise, in the order of nanosecond precision. We basically need to pull a pin high for a very short time, at a very precise time in the debug sequence.

Closeup of the microcontroller. The DCOUPL capacitor we need to remove is marked in orange.
Closeup of the microcontroller. The DCOUPL capacitor we need to remove is marked in orange.
Closeup of the microcontroller after the MOSFET was added to the DCOUPL pin
Closeup of the microcontroller after the MOSFET was added to the DCOUPL pin

We previously used the Raspberry Pi Pico to communicate with the board; for the glitch timings, we’ll use the PIO (programmable IO) feature of the RP2040 chip on the Pi Pico. This cool piece of hardware allows us to set pins at the clock speed of the chip: so we can have a 125 MHz signal. We can feed the PIO peripheral using DMA (Direct Memory Access), so we can do other things while the glitch signal control runs in the background. The buffer that is used for the glich pin is thus filled with zeroes, except at one location, where there are a couple of consecutive ones.

A couple of modifications onto this idea were needed to make the attack more reliable:

  • Overclocking the Pi Pico to 250 MHz instead of the default 125 MHz. This doubles the precision in glitch length and duration.
  • The Pi Pico does its serial over software USB. The interrupts from USB sometimes throw off the timings, so we run the USB stack on core 0, and our code on core 1, with interrupts disabled.
  • We set the drive strength of the power pin to 12 mA, and the slew rate to ‘fast’ (instead of the default ‘slow’). This makes sure that the internal capacitance of the MOSFET gate charges sufficiently fast, so we have a nice sharp edge on the glitch pin.
  • We power-cycle the board after every glitch attempt, to reset it fully. This is done by connecting the 3.3V supply pin to a GPIO pin of the Pi Pico. The CC2510 microcontroller draws so little power that we don’t need a MOSFET for this, we can just directly power the board from that GPIO pin.

Now, a series of debug instructions needed to be constructed, with as few ‘forbidden’ instructions as possible, since every ‘forbidden’ instruction incurs the risk of rebooting the board. This came down to reading the 8051 instruction set, trying to find a series of instructions that load from a certain address into the accumulator register (the register sent back after every instruction). This ultimately was whittled down to two instructions:

MOV DPTR,#data16 (this loads a 16-bit constant into the DPTR/data pointer address)
MOVX A,@DPTR     (this loads the memory at DPTR into the accumulator register)

The full debug sequence is then, byte per byte:

  1. DEBUG_INSTR, 3 bytes opcode argument
  2. MOV DPTR,#data16 opcode
  3. #data16 high byte
  4. #data16 low byte
  5. answer from microcontroller: accumulator register
  6. READ_STATUS
  7. answer from microcontroller: debug locked or not
  8. DEBUG_INSTR, 1 byte opcode argument
  9. MOVX A,@DPTR
  10. answer from microcontroller: accumulator register
  11. READ_STATUS
  12. answer from microcontroller: debug locked or not
View of the debug sequence on a logic analyser. Orange lines separate the different debug instructions, red lines are the timings of the glitches
View of the debug sequence on a logic analyser. Orange lines separate the different debug instructions, red lines are the timings of the glitches

Using a parameter sweep accross the entire timing of the debug sequence and across glitch durations, a vulnerability was found in the DEBUG_INSTR: if we glitch the power supply right after DEBUG_INSTR, sometimes the debug lock is bypassed, resulting in execution of the DEBUG_INSTR and the correct accumulator answer. The next READ_STATUS instruction then also shows that the debug lock is unlocked. However, if we then execute another DEBUG_INSTR, the microcontroller will be locked again and it will fail. We thus need two succesful glitches to read out one byte of data, and we can know if our exploit executed succesfully by looking at the answers of READ_STATUS.

The parameters were then more finely tuned to optimize the chance of a succesful glitch. This tuning is very sensitive to parasitic elements: if the board is touched or moved even very slightly, the parameter tuning is off and the glitch parameters need to be recalibrated. The best success rate for a single glitch we’ve gotten is about 5%, but this quickly drops after running for an extended time, likely because of heating. Since a succesful glitch requires two successes, the success rate of reading a byte is 5% * 5% = 0.25% (if we assume both glitch chances are independent). We can do 35 attempts per second, so that results in about 1 byte read out every 20 seconds. Reading out the entire 32K flash would then take about 4 days. This attack was developed and tested against the wiped board and was verified to read out the correct data.

Graph of the amount of succesful glitches in 10000 attempts. The time offset of the glitch is on the X-axis, the glitch duration is 8
Graph of the amount of succesful glitches in 10000 attempts. The time offset of the glitch is on the X-axis, the glitch duration is 8

To make glitching more practical, the scripts were moved to a server in the Zeus WPI basement. Running the glitch scripts was done in a tmux session, so this attack could run in the background and we could remotely monitor it. After a couple of days, the readout rate considerably dropped and the parameters needed to be re-tuned.

Reading out the chip is rather slow: this is because our priorities for this exploit: first, to be cheap (the total cost of the hardware is about €10); and second to be simple and easy to reproduce. Some ways the success rate could be improved:

  • More precise glitch timing and duration
  • Connecting the MOSFET to ground (0V) likely is not the ideal voltage, a higher or lower short-circuit voltage is likely more efficient
  • The glitch waveform is now a square wave. In this paper2, the authors propose arbitrary glitch waveforms and seem to get higher success rates as well
  • Instead of voltage glitching, a different fault injection attack might also be succesful (for example clock glitching or electromagnetic fault injection)

Some other Texas Instruments RF chips use the same debugging protocol (ChipCon) and also have an embedded 8051 core. These chips have the part number CCxxxx. It’s likely that these chips are also vulnerable to the same attack. A non-exhaustive list:

  • CC1110
  • CC2430
  • CC2431
  • CC2510
  • CC2511

This vulnerability was not explicitly reported to Texas Instruments, because they already have a security advisory that covers fault injection attacks against all chips: in TI-PSIRT-2021-100116 titled ‘Physical Security Attacks Against Silicon Devices’, published on January 31, 2022, it was stated in the ‘Affected products and versions’ section:

  • If a TI product does not have documented mitigations against a specific physical attack, it may be vulnerable.
  • If a TI product does have documented mitigations against a specific physical attack and a related vulnerability for that product is confirmed by TI, TI will publish a specific disclosure for that part.

This feels like a cheap cop-out where they say that basically every device is vulnerable unless they state otherwise, and they don’t even have to show this in the datasheet or in further security advisories, unless there is a mitigation for the exploit. Since it’s unlikely that there is a mitigation against this hardware attack without a silicon revision, TI probably wouldn’t even make a security advisory.

At the time of writing this blog post, 30.19% of the 32K flash memory of the e-ink tag has been read out. This memory was then loaded into Ghidra (a reverse engineering framework), where we confirmed that the dumped code is valid and sensible 8051 code. A next blog post will (hopefully) contain details about the reverse engineered protocol and how to talk with the e-ink tags. All exploit code is available in this repository.

Edit: reading out the entire memory took 6 days, so about 16 seconds per bytes, or 0.063 bytes per second. I think this might be the slowest bandwidth I’ve ever encountered.

Thanks pcy for answering my many questions and concerns about voltage glitching attacks.

If you have questions, comments or some things are not clear, feel free to email me zeusblog@notdevreker.be

❌
❌