The RSA algorithm

For performing RSA encryption with Java, you luckily don't need to know all the gory details of how RSA works. But it's worth having an overview, at least so that you can understand the terminology. (Note that outside of Java, you do need to know some of these details to use RSA securely; luckily, at least Sun's implementation solves some potential security issues that you could otherwise run into if you just used RSA "without thinking about it".) OK, hold on to your hat...

Basic RSA equation

The essential idea is as follows:

In other words, if we raise m to the power e, taking the answer modulo n, there exists some other number d, so that raising the result to d (and taking the answer modulo n again) gets us back to the original number. Thus, we have a way to perform encryption then decryption, each using a different number: e in one case and d in the other.

Using the RSA equation in practice

Once we've picked values for p and q (remember, these are random large primes), we need to pick suitable corresponding values for e and d. In practice, it's easiest to just use some small value of e that we know is suitable (65537 is commonly used), and then calculate the corresponding d. Then:

How to calculate d

The method of calculating d actually turns out to be the basis of the security of RSA:

It is mathematically easy to find d if and only if you know p and q, but otherwise, it is extremely difficult.

Remember that p and q are initially chosen by the person creating the key pair, but then these numbers are discarded (or at least, kept secret) once d has been calculated; only the result of multiplying the two numbers together (n = pq) is made public (along with e), and in practice, this multiplication cannot be undone if p and q are large enough.

Terminology

The following terminology is generally used, and is useful to know when working with the Java RSA classes:

Security of RSA

Purely in theory, RSA could be attacked as follows:

To get round the latter problem, it is important to make sure that the message is unpredictable by appending random padding. Sun's implementation handles this (but another implementation might not by default). Adding padding also solves a problem with very short messages: if the message is so short that me is less than n, then an attacker can simply calculate the eth root of m to retrieve the message. (Once it exceeds n, and so the number is actually reduced modulo n, such a calculation isn't feasible.)