Back in August, I attended the CRYPTO 2006conference in Santa Barbara, where Daniel Bleichenbacher gave aneye-opening talk that highlighted a very common implementation mistakepeople make with the RSA cryptosystem. Since my own background is incryptography I thought I would try to describe not only this commonmistake and its implications, but also some details regarding why thismistake leads to vulnerabilities, in a way that’s hopefully suitablefor a wide audience. For those who don’t recognize the name, Daniel isa well-known and brilliant cryptographer who, among other things, foundcryptographic flaws in SSL v3.0 and also the random number generatorassociated with the Digital Signature Algorithm. Well, he is at itagain!

Before going any further I want to emphasize thatthe flaw Daniel found is not one that is inherent in the RSA algorithmitself; rather, it deals with a specific implementation mistake thatseems to be made widely. Specifically, he saw that many implementationsincorrectly handled the padding associated with the payload in an RSAsignature and as a result, one could present bogus signatures to theseimplementations that would be accepted as valid. (Although Daniel foundthe flaw in the Java cryptographic library, it has been discovered inthe widely-used Open SSL library, as well as the Mozilla Firefox,Konqueror, and Opera Web Browsers, and it will probably be discoveredin other places as well. Note that in many cases, the flaws have beenpatched.)

As I mentioned, libraries that possess this flaw can be duped intoaccepting digital signatures as valid, even though in reality thesesignatures do not satisfy the appropriate mathematical relationsnecessary to be considered valid RSA signatures. For example, inpractice this means that one of the above-mentioned Web browsers couldbe made to accept bogus digital certificates (since these are createdusing digital signatures). As an end user, you might think that you arecommunicating with a legitimate entity over a secure connection. Thebrowser will dutifully display all the appropriate lock icons and theaddress bar (which will take on a friendly and inviting color) willcontain a URL that begins with “https.” Although, in reality, a phisherneeding no more than a pen and paper to do the appropriate calculationscould have been the one who created this digital certificate and no onewould be the wiser.

Now, for those of you who are so inclined, I’ll describe the detailsof why this flaw is really a flaw. I don't believe that you need astrong background in cryptography to at least get a flavor for thedetails. All that is required is a basic, high-level understanding ofcryptographic terminology and a willingness to do some (slightly messy)arithmetic. If you are not already familiar with it, the RSA-signingalgorithm involves so-called “modular arithmetic”. One way to think ofmodular arithmetic is to imagine that you are doing regular arithmetic,but whenever you get to a specific value, you cycle back to zero–kindof like how a clock works, where you go from 12 to 1. The modulus valuedictates when you go back to 0. It turns out that this wrapping aroundprocedure results in the same value you would get when you divide bythe modulus and take a remainder. By *“mod N”* or *“modulo N*” we mean the remainder that would result when one divides by *N* (or equivalently, the value you would get if you wrapped back to 0 every time you got to *N*).

In particular, in RSA you take the message you want to sign, andhash it and pad it appropriately (so it’s the right length). Then, youinterpret the resulting byte string as a number and raise it to thepower of a so-called *signing exponent e*, taking the result modulo the RSA modulus value *N*: *s = (PADDING || HASH(m))^d mod N.* Here, *s* represents the final signature value, *“PADDING”* represents the padding, *HASH() *represents a cryptographic hash function, *d* represents the secret RSA signing key, and *N* represents the RSA modulus. Typically, *N* is equal to the product of two large prime numbers which are often referred to as *p* and *q* (these values of *p* and *q* are also kept secret). Sometimes we’ll refer to the value *PADDING || HASH(m)* as the payload.

To verify an RSA signature, one takes the signature *s* and raises it to the power of the so-called public verification exponent *e*, and takes the remainder when divided by the modulus (that is, *s^e mod N*). If *e* and *d* have a specific property in relation to *N*, it turns out that this process goes backwards and should yield the original payload: *PADDING || HASH(m)*. To complete the verification, *PADDING* is removed and the message *m* is separately hashed to see if it matches up with what is remaining. If everything checks out, the signature is valid.

In practice, *e* is public, and is often a convenientlychosen value that makes the rest of the verification computationeasier. For example, people might pick *e* = 3 or *e* = 17 or *e*=65537 (note that all these values are close to a power of 2, and this allows one to compute the exponentiation to the power *e* quickly). For the remainder of this entry, we’ll just assume that *e*=3 (since this case is the one Daniel described in his talk).

At the heart of the security for RSA is the premise that given just the value *s^e mod N*,it should be hard for someone (who doesn’t know the secret signing keyd or the prime factors p and q) to compute the value s. (Otherwise, onemight be able to start with a “message” value for *s^e* and gobackwards to find the corresponding “signature” value s, and this wouldconstitute a signature forgery.) Now, imagine for a moment that youremoved the *mod N* step from RSA. Then RSA would be trivial tobreak, since it would suffice to take a cube root in order to invert(which allow signature forgeries). Taking the cube root of an integeris pretty trivial. However, adding the “*mod N*” step into the mix seems to make taking such “modular” cube roots much harder (note that for security, it is crucial for *N* to be an appropriately constructed value).

Daniel found that some implementations, when given a signature *s*, will perform the verification step to get a payload string of the form *PADDING || HASH(m)* and remove PADDING. However, they fail to check if *HASH(m)* is the last thing that appears in that string. As a result, one can append a specific garbage byte string after *HASH(m)* that essentially tricks the implementation into bypassing the *mod N*step. Put another way, allowing extra garbage at the end allows us toconstruct a value that is already a perfect cube for which a cube rootcan be computed easily. I will show how the attack works in some moredetail below. (Before proceeding, I want to acknowledge Hal Finney, whogave a nice description of this attack on the Cryptography mailinglist. I’m going to base my explanation very closely on the one he gave,with a couple of extra minor details and steps filled in.)

In practice, the payload for an RSA signature would look somethinglike: 00 01 FF FF … FF 00 ASN.1 HASH. The beginning bytes representso-called PKCS padding (which involves putting a 0x00 byte, followed bya 0x01 byte, then a string of 0xFF bytes, followed by a single 0x00byte, and then something called the ASN.1 description). Don’t worry tooabout what the ASN.1 description is, since it is immaterial for thisdescription. Think of it as a few bytes that appear before the hash totell the parser how to handle HASH. After these initial bytes, theactual hash of the message whose signature is being verified isincluded.

However, as I mentioned, if there is extra garbage at the end ofthis payload, many implementations will not realize something is amissso long as the first part of the padding is correct. (Correctimplementations should not accept payloads when there is garbage at theend.) Now, suppose that the implementation computes the followingstring from the signature: 00 01 FF FF … FF 00 ASN.1 HASH GARBAGE.Incorrect implementations give us free reign on setting GARBAGE towhatever we want. Our goal is to set things up so that the value forGARBAGE will make the numeric value of the whole expression a perfectcube for which we know the cube root. This cube root will appear to bethe correct signature. We now show how to achieve this goal.

Specifically, let *D* be the integer value associated withthe string 00 ASN.1 HASH (by “integer value”, I mean the value youwould get if you converted from the base the string is in to aninteger). For a hash function like SHA-1, this string is 288 bits (36bytes) long. Suppose that our modulus N is 3072 bits long (a commonsize for many RSA moduli). By putting a 2072-bit string for GARBAGE, wewill arrange the string so that the value HASH is positioned at the2072 most significant bit. Then, the entire byte string with theGARBAGE padding at the end takes on numeric value:

2^3057 - 2^2360 + *D**2^2072 + GARBAGE

(The portion 2^3057 - 2^2360 accounts for the leftmost substring 00 01 FF FF … FF at the beginning, the value *D**2^2072accounts for the 00 ASN.1 HASH substring in the middle, and the valueGARBAGE accounts for the rightmost remaining substring on the right).

Letting *T* be 2^288 – *D*, we can simplify the above expression to 2^3057 – *T**2^2072 + GARBAGE. By playing with the message a little, we can also ensure that *T* is a multiple of 3. The cube root of this value turns out to be 2^1019 – (*T**(2^34)/3) when GARBAGE is the byte string associated with the numeric value (2^1087)*3**T*^2 – (2^102)*(*T*^3)/27 (padded with enough zeros at the beginning to make it 2072 bits in length).

Incorrect implementations will accept the value 2^1019 – (*T**(2^34)/3)as a valid RSA signature! Specifically, they will take this value andcube it and parse the resulting integer as 00 01 FF FF … FF 00 ASN.1HASH GARBAGE. Then, they will read off the 0x00, 0x01, 0xFF, and 0x00bytes until they get to the ASN.1 bytes. These bytes will tell them howto interpret HASH. Then, they will check that the original messagehashes to the value in HASH. If so, they will accept the signaturewithout realizing that the GARBAGE bytes are there as well. If you havean RSA implementation, you should definitely check for this paddingerror.

One of the main lessons to be learned here is that implementingcryptographic primitives requires extreme care. Sometimes, an innocuousdeviation from the specification is enough to take something that oughtto be perfectly safe and render it completely insecure.