Attacks on RSA decryption and mitigation

There are attacks on RSA involving the unpad operation. This document describes those attacks, whether the YubiKey and/or SDK is vulnerable, and SDK mitigations.

Chosen ciphertext attack on RSA

Suppose an attacker is able to obtain an RSA ciphertext block from unknown plaintext. The goal is to retrieve the plaintext of this original message.

The attacker creates a different ciphertext block mathematically related to the first. If the attacker is able to somehow get the private key owner to decrypt this new block and return the plaintext, they can deduce the original message.

That is, there exist

  P1  (plaintext 1, the unknown)
  C1  (ciphertext 1, known to the attacker)
  P2  (known)
  C2' (known)(the chosen ciphertext)

If they are mathematically related (in a particular way), the attacker "can solve for P1".

Incidentally, just to be pedantic, C2' (C-two-prime, the chosen ciphertext) is not the ciphertext for P2. Rather, while there is a C2 (the ciphertext for P2), the ciphertext the attacker chooses is mathematically related to both C2 and C1. But it is not C2 exactly.

There are two reasons this attack generally does not succeed in the real world. One, protocols and applications are written so that results of private key operations are not returned to outside querants. And two, even if the plaintext block were to be returned, when the chosen ciphertext is decrypted, the "unpad" operation would undoubtedly fail and an error, rather than the plaintext, would be returned. And even if the decrypted data somehow survived the unpad operation (it looked like a properly padded block), the data returned would be the unpadded data, not the entire block.

Nonetheless, this is the foundation of other attacks. This document describes these attacks, when an application is susceptible (or not), and what mitigations are required. Finally, this document describes some code inside the .NET YubiKey SDK that would appear to be susceptible, and the mitigations employed to reduce its exposure.

❗ Note that this attack only allows the adversary to recover one message. It does not threaten the private key itself.

PKCS 1 padding (aka PKCS #1 v.1.5 and P1.5)

To encrypt using RSA, each plaintext block must be numerically less than the RSA modulus. However, if the plaintext is too small, then the RSA operation is not secure. Hence, to build a block that is less than the modulus, but also not a small value, the block is padded. The oldest standard padding scheme for RSA encryption is known as PKCS 1 (PKCS = Public Key Cryptography Standards, and the first of those is related to RSA).

In this scheme, a block of memory or a byte array the size of the modulus is created. It is filled this way.

00 || 02 || pad bytes || 00 || plaintext

The pad bytes are random, non-zero bytes. The standard specifies that there must be at least 8 pad bytes. For example, if the RSA key is 1024 bits (128 bytes) and the data to encrypt is a 32-byte value (e.g. a 256-bit AES key), the block of data would be

00 || 02 || 93 random non-zero bytes || 00 || 32 plaintext bytes

This block is converted into a number and encrypted.

After decrypting the ciphertext, the private key owner will check to make sure the first byte is 00 and the second is 02. If not, error, don't return any plaintext.

If the first two bytes are correct, search for the first occurrence of the 00 byte. If there is none, error, don't return any plaintext.

If there is a 00 byte, make sure there are at least 8 bytes of pad (i.e. the 00 appears after index 9). If not, error, don't return any plaintext.

If these checks all pass, return all the bytes after the first 00 byte.

Some applications and protocols are written with another check, namely, how big the unpadded data must be. For example, it's possible the encrypted data must be either 16, 24, or 32 bytes (it is an AES key), or it must be 48 bytes (it is a master session key). If not, error, don't return any plaintext.

Bleichenbacher Attack

In 1998, Daniel Bleichenbacher published some results of his research, which included a way to employ the chosen-ciphertext attack on RSA, even without knowing the full plaintext block result. It relied on knowing where, in a decrypted block, the padding scheme failed.

Send the private key owner a chosen ciphertext block. The owner decrypts and then tries to unpad. When that operation fails, the owner sends a response indicating what the failure was. That is, the error message might be, "Decryption failed, block[0] not 0." Or maybe it was "Decryption failed, block[1] not 2". And so on.

Although the attacker does not know all the bytes of the plaintext, they do know some of them based on the error message.

Now send another chosen ciphertext. And another. And keep sending messages. Depending on the private key size, thousands or millions of messages. Now based on the results, solve for the original plaintext message.

This attack is not practical unless two conditions are met. One, the decryptor (private key owner) must be willing to decrypt thousands or millions of messages in a timely manner, no questions asked (the term in cryptography is "oracle"), and two, return descriptive error messages.

These conditions were met in the real world. An SSL server likely responds to all handshake requests and processes them automatically. There were some implementations that did indeed return descriptive error messages. The target message to attack would be the one in which a session key was encrypted using the server's public key. Once the attacker knows the session key, they can read an entire session's messages.

Mitigation

To prevent the attack, SSL server code stopped returning descriptive error messages. All errors (failed decryption because of an unpad error or anything else) triggered a single response message.

Attack updated

Without the detailed error message, Bleichenbacher then timed the responses. How long did it take for the SSL server to respond? If it was very quick, the error was the first byte of padding. If it was a little longer, the error was the second byte. And so on.

Even though the server was not explicitly returning a descriptive error code, the amount of time it was spending on the decryption was enough information to launch the attack.

The number of messages required was generally estimated to be around 1,000,000 for a 2048-bit key.

Mitigation updated

Run all unpad checks, no matter what. If the first byte was wrong, note that there is an error, then check the next byte. And so on. In the end, if there were any errors, simply return the generic error message.

Not enough

Simply performing all checks is not necessarily enough. If there is a variation in the amount of time spent on the unpad operation, information is leaked. For example, if both the first byte and the second byte are incorrect, and there is no 00 marker byte, maybe the amount of time to process is greater than if the problem is in the second byte alone. The original timing attack said a quicker time meant the error was in the first byte. But for some particular implementation, a quicker time could mean the error was in the second byte.

It's not enough to make the computation slower, it really needs to be uniform. No matter what the error, the time to perform the unpad operation is the same.

Another mitigation: OAEP

Another way to solve this problem is to simply not use P1.5 padding. In 1994, Mihir Bellare and Phillip Rogaway had developed a different padding scheme called Optimal Asymmetric Encryption Padding (OAEP). Because of the Bleichenbacher attack, standards and protocols had incentive to adopt this existing algorithm.

With OAEP, the padded data was indistinguishable from random (to help prevent other "side-channel" attacks), and it was much more difficult to launch a Bleichenbacher attack, even if the timing was known. That is, it was still possible an implementation of OAEP unpad would leak information about where the padding went wrong, but because of how the scheme worked (comparing digests of data rather than the data itself), that information did not correlate to what bits in the chosen plaintext were different from the original plaintext. Furthermore, it was much easier to write code that was more uniform anyway.

Attack on OAEP

In 2001, James Manger published his attack on OAEP. In this, the attacker needs to know if the data decrypted from the chosen ciphertext is greater than or less than a particular value (often called "B"). It will take a few thousand chosen-ciphertext messages, but eventually the attacker will be able to recover the original message.

When verifying whether unpadded data is correct, there is a check to see if the most significant byte is zero or not. If it is not zero, that's an error. Furthermore, if it is not zero, the attacker knows the result is greater than B. So if the OAEP unpad code checks the most significant byte, and then exits immediately, that's a quick response. In that case, the attacker knows the result from the chosen ciphertext is greater than B. A longer response means it is less than.

Mitigation

Once again, make sure the OAEP unpad operation performs the entire process every time, and make the total time (error or no error) as uniform as possible.

One more mitigation

There is another possible mitigation: variable times. This would be something similar to "RSA blinding".

In order to thwart timing attacks on the RSA algorithm itself (not the padding scheme), an implementation could add some random amount of time to the process. This is called "blinding". When this happens, a quick response with one ciphertext block does not necessarily mean that the actual computation time is less than the actual computation time of a slow response with another ciphertext block.

Now add in the unpad operation. An attacker likely knows only how long the total RSA operation took (RSA decryption and unpad). If there is too much variation in the RSA decryption time, then there is no way to tell how much of the total time was RSA and how much was unpad.

It is also possible to build implementations of the P1.5 and OAEP unpad algorithms that add a variable amount of time each time it is computed.

While variable-time RSA blinding implementations are used in the real world, variable-length unpad schemes are rare.

Signing

A digital signature using RSA involves performing the padding operation, then encrypting that result using the RSA private key. The owner of the private key does not perform the unpad operation, so the attacks listed here are not relevant.

What the attacker needs

For these timing attacks on the unpad operation, the attacker needs two things:

An oracle, namely, the owner of the private key must be willing to decrypt thousands or millions of messages in a timely manner, no questions asked.
Accurate times for completion of the task.

Susceptibility of the YubiKey to these timing attacks

The YubiKey itself does not perform the unpad operation. If you call on the YubiKey to decrypt, it will perform "raw" RSA and return the still-padded result. It is the responsibility of the calling application to unpad.

Hence, the YubiKey itself is not susceptible to this class of attack.

Susceptibility of the .NET YubiKey SDK to these timing attacks

Because an application calling on the YubiKey to decrypt will need to perform the unpad operation, the SDK provides a class, RsaFormat, that can unpad the result of RSA decryption.

If you use this class to unpad RSA decryption, will your application be susceptible to these timing attacks?

The oracle requirement

A YubiKey will almost certainly not be used in some application as an oracle. That is, the YubiKey will likely not be running automatically, performing decryptions no questions asked. Probably the only situation where this could happen is if someone wants to use a YubiKey as a substitute for an HSM providing cryptographic services for an SSL server. That is not recommended, by the way.

The most likely use case for a YubiKey performing decryption is for an individual user to decrypt messages. In that case, it is extremely unlikely that an attacker will be able to get the user to perform thousands or millions of decrypt operations in a timely manner.

The accurate time requirement

Because the most likely use case for decrypting with a YubiKey involves user interaction, including PIN entry and touch, the time for each RSA decryption is so varied it is virtually useless.

Even though it is highly unlikely an attacker could mount an unpad timing attack on the SDK's RsaFormat class when used in conjunction with the YubiKey, we will examine the operation's time variation.

RsaFormat

With the .NET YubiKey SDK, you have two choices for unpadding. One, use the RsaFormat class in Yubico.YubiKey.Cryptography. Or two, use an alternate implementation, such as one you write yourself. Note that the Unpad code that the .NET Base Class Libraries use is not publicly accessible.

The engineers at Yubico have taken care to make sure the unpadding operations following P1.5 and OAEP are as uniform as possible. Yubico makes no guarantees that this code is completely immune to timing attacks. However, tests that timed how long to unpad correct versus incorrect values showed little variation. See the results below.

RsaFormat timing results

In the following tables, timing numbers are in microseconds:

0.372 microseconds = 0.000000372 seconds (372 nanoseconds, 0x000372 millisecond)

14.1 microseconds = 0.0000141 seconds (14,100 nanoseconds, 0.0141 millisecond)

These are averages over several timing iterations. Where applicable, results are given based on the message size. For example, the baseline measurements (in the "Correct" column) are for no-error unpad operations when the encrypted data (the unpadded message) is 16, 24, 32, or 48 bytes long. For the "First byte wrong" column, the first byte was not valid, but everything after that was correct, including the message of given length.

All timing numbers were taken on a computer with a 1.6 GHz Intel Core i5, 8th Gen chip, running Windows 10.

Start with P1.5.

1024-bit block PKCS 1 v1.5

Correct P1.5 Unpad	First Byte Wrong (1)	Second Byte Wrong (2)	Not Enough Pad (3)	No Zero Byte	1, 2, and 3
16: 0.354	16: 0.346	16: 0.346	16: 0.341	0.339	16: 0.340
24: 0.370	24: 0.340	24: 0.336	24: 0.348		24: 0.345
32: 0.371	32: 0.335	32: 0.335	32: 0.349		32: 0.346
48: 0.379	48: 0.335	48: 0.336	48: 0.347		48: 0.345
Overall average
0.369	0.339	0.338	0.346	0.339	0.344

2048-bit block PKCS 1 v1.5

Correct P1.5 Unpad	First Byte Wrong (1)	Second Byte Wrong (2)	Not Enough Pad (3)	No Zero Byte	1, 2, and 3
16: 0.676	16: 0.675	16: 0.670	16: 0.678	0.684	16: 0.669
24: 0.681	24: 0.672	24: 0.670	24: 0.681		24: 0.679
32: 0.692	32: 0.670	32: 0.668	32: 0.683		32: 0.678
48: 0.692	48: 0.671	48: 0.673	48: 0.685		48: 0.679
Overall average
0.686	0.672	0.670	0.682	0.684	0.676

These numbers indicate that there is very little variance between times based on message size. Secondly, there is very little variance based on error or no error. Lastly, there is very little variance based on the type of error. For example, whether the error is an incorrect first byte, or a combination of the first three errors, the amount of time the RsaFormat method will take is very similar.

Next, let's look at OAEP.

1024-bit block OAEP with SHA-256

Correct OAEP	First Byte Wrong (1)	Incorrect lHash (2)	Wrong Separator (3)	No Separator	1, 2, and 3
16: 16.38	16: 16.48	16: 16.29	16: 16.19	16: 16.21	16: 16.19
24: 16.34	24: 16.36	24: 16.22	24: 16.14	24: 16.32	24: 16.14
32: 16.29	32: 16.68	32: 16.52	32: 16.18	32: 16.20	32: 16.11
48: 16.18	48: 16.24	48: 16.19	48: 16.11	48: 16.19	48: 16.03
Overall average
16.29	16.44	16.30	16.15	16.23	16.11

2048-bit block OAEP with SHA-256

Correct OAEP	First Byte Wrong (1)	Incorrect lHash (2)	Wrong Separator (3)	No Separator	1, 2, and 3
16: 28.15	16: 28.36	16: 28.24	16: 28.07	16: 28.20	16: 28.06
24: 28.36	24: 28.82	24: 28.79	24: 28.19	24: 28.22	24: 28.07
32: 28.11	32: 28.99	32: 28.31	32: 28.18	32: 28.04	32: 28.13
48: 28.11	48: 28.32	48: 28.18	48: 28.31	48: 28.20	48: 28.08
Overall average
28.18	28.62	28.38	27.18	28.16	28.08

Once again, we see very little variance between times based on message size. Secondly, there is very little variance based on error or no error. Lastly, there is very little variance based on the type of error.

The time it takes to perform OAEP is dependent on the digest algorithm chosen. The numbers above are from timing exercises using SHA-256. The following numbers are averages when using the other digest algorithms.

1024-bit block OAEP with SHA-1, SHA-256, and SHA-384

Correct OAEP	First Byte Wrong (1)	Incorrect lHash (2)	Wrong Separator (3)	No Separator	1, 2, and 3
SHA-1
14.27	14.22	14.19	14.10	14.21	14.17
SHA-256
16.29	16.44	16.30	16.15	16.23	16.11
SHA-384
17.41	17.42	17.23	17.24	17.26	17.25

2048-bit block OAEP with SHA-1, SHA-256, SHA-384, and SHA-512

Correct OAEP	First Byte Wrong (1)	Incorrect lHash (2)	Wrong Separator (3)	No Separator	1, 2, and 3
SHA-1
26.29	26.38	26.42	26.8	26.15	26.10
SHA-256
28.18	28.62	28.38	27.18	28.16	28.08
SHA-384
30.13	30.43	30.21	29.87	30.01	29.85
SHA-512
32.26	32.77	32.39	32.21	32.41	32.30

Table of Contents

Attacks on RSA decryption and mitigation

Chosen ciphertext attack on RSA

PKCS 1 padding (aka PKCS #1 v.1.5 and P1.5)

Bleichenbacher Attack

Mitigation

Attack updated

Mitigation updated

Not enough

Another mitigation: OAEP

Attack on OAEP

Mitigation

One more mitigation

Signing

What the attacker needs

Susceptibility of the YubiKey to these timing attacks

Susceptibility of the .NET YubiKey SDK to these timing attacks

The oracle requirement

The accurate time requirement

RsaFormat

RsaFormat timing results

1024-bit block PKCS 1 v1.5

2048-bit block PKCS 1 v1.5

1024-bit block OAEP with SHA-256

2048-bit block OAEP with SHA-256

1024-bit block OAEP with SHA-1, SHA-256, and SHA-384

2048-bit block OAEP with SHA-1, SHA-256, SHA-384, and SHA-512