0x4588 ≠ 4588: the one-character bug that hid post-quantum from our own dashboard

We turned on post-quantum key exchange for every connection through the agent. The dashboard whose entire job is to count what crypto is in use showed zero. Not low — zero. Here's the bug, why it stayed hidden for weeks, and the one thing that finally caught it.

The setup

The splice injects a post-quantum key-exchange group into the ClientHello on its way to the server, so that even a client which only knows classical crypto can still end up on a PQC handshake (how that works). We had just updated the injected group to the new IANA-final hybrid, X25519MLKEM768.

The bug

In TLS, codepoints are hex by convention — everywhere. Cipher suites (0x1301), named groups (0x001D for X25519), extensions; the RFCs, Wireshark, OpenSSL, and our own code all read them in hex. The line being edited already held the previous codepoint as a hex literal. So "codepoint" meant "write 0x…".

But the IANA registry publishes its values in decimal, and it lists X25519MLKEM768 as 4588. That number is the trap: 4588 is all digits 0–9 — a perfectly valid numeral in both bases, with nothing to mark which one. Primed by the hex-everywhere convention, we wrote the decimal 4588 as 0x4588. The compiler said nothing, because 0x4588 is a completely legal hex literal. It just equals 17800.

IANA registry:    X25519MLKEM768 = 4588    (decimal)
what we shipped:  0x4588         = 17800   (decimal)  ← unassigned
the real value:   0x11EC         = 4588    (decimal)

So every ClientHello the agent sent advertised group 17800 — assigned to nothing. Servers did exactly what the spec says: ignore the unknown group and fall back to whatever the client already offered. No PQC was negotiated. The upgrade silently did nothing, on every connection.

Why it hid for weeks

There was no crash, no error, no red test. Two more bugs even made it look healthy.

First, the name table was wrong in a mirror-image way. It mapped the real codepoint, 0x11EC, to the wrong label — an older Kyber draft — and mapped the fictional 0x4588 to "X25519MLKEM768." So when real servers on the internet negotiated genuine ML-KEM (0x11EC), our inventory filed it under the Kyber draft. The misinjection and the mislabel pointed the same way: X25519MLKEM768 read as zero from both directions.

Second, the test that should have caught it accepted either 0x4588 or 0x11ec as a pass. The real OpenSSL linked into the test emitted 0x11ec, so the test went green — while the 0x4588 branch sat there, never exercised, making the whole thing look certified.

A wrong constant, a wrong label that hid the right value, and a test with an escape hatch. Each alone might have been caught; together they formed a closed loop that looked correct from the inside.

What caught it

Nothing internal could — every internal source of truth agreed with itself. The only signal was a number that should not have been zero. The fix was to ask the wire, which can't be fooled by your own constants:

openssl s_client -connect cloudflare.com:443 \
    -groups X25519MLKEM768 -trace

The trace prints the group the server actually selected: 4588 in decimal, 0x11EC in hex. Real value in hand, the rest was mechanical — single-source the codepoint so the injected value, the name table, and the test all read from one definition, and the "either/or" in the test became a single expected value.

The lessons

Four, and none of them are about cryptography:

Hex and decimal are a trap at registry boundaries. IANA publishes decimal; C literals are hex. 0x4588 ≠ 4588 is the kind of bug that compiles, passes review, and ships — syntactically perfect, semantically wrong — precisely because an all-digit value is legal in both bases.

A test that accepts "A or B" for a value that's supposed to be exactly one thing isn't a test. It's a rubber stamp with extra steps.

A number that should be nonzero and is zero is not "no data." It's a lead. The dashboard was working perfectly; it was faithfully reporting that nothing was happening.

For anything on the wire, the wire is the only ground truth. Your constants, your labels, and your tests can all agree with each other and all be wrong together. Probe the real thing.

This is what "reading TLS at the byte level" actually means day to day — not memorizing the spec, but distrusting every layer between you and the bytes, including your own. It's also why, when someone tells me their fleet is "doing post-quantum," my first move is a live trace, not a config file. The wire is the only thing that can't lie to you.

← All posts