I like tests. I especially like reusable test vector libraries. Sometimes test vectors are lovingly handcrafted to target obscure edge-cases. Those ve

Accumulated Test Vectors

submited by
Style Pass
2024-10-09 10:30:03

I like tests. I especially like reusable test vector libraries. Sometimes test vectors are lovingly handcrafted to target obscure edge-cases. Those vectors belong in Wycheproof or with the upstream specification. Sometimes though vectors are produced by sheer brute force. Enumerate every possible input and check the output. Try a million random inputs and see what happens. Combine all possible input sizes for every parameter. Make one very, very large input.

These vectors can be tremendously effective and require no prior insight into the bugs you’re hunting. If you run 300 000 random tests, you have a 99% chance of hitting any 2⁻¹⁶ edge case.[1]

For example, you can get pretty good coverage of the internals of the new post-quantum algorithm ML-KEM by generating random key pairs, passing the public key to Encapsulate and the resulting ciphertext to Decapsulate, and then making sure keys, ciphertext, and shared secret are as expected.[2] The reference implementation offers a program to produce a corpus of 10 000 of these known-answer tests.

The catch is that—unless the result is self-evidently correct[3]—you need to actually check in inputs and expected outputs somewhere. Those ML-KEM vectors run into the tens of megabytes, even compressed. Checking in the reference implementation is also undesirable, with its nontrivial size, incompatible build system, and different supported environments.

Leave a Comment