For a very long time now, people have been coming up with ingenious ways of exchanging messages securely. From the basic Caesar cipher to the Enigma, however, there has been a universal caveat: how to agree on the encryption/decryption key without meeting face to face?
To put it simply, consider the following situation: Alice and Bob want to communicate with one another using an Enigma machine. Alice adjusts the settings of her machine, tells Bob what those settings are, types a secret message, and sends the encrypted text to Bob, who then adjusts his own machine’s settings and proceeds to decrypt the message.
But hang on a minute. How exactly did Alice give her settings to Bob? Alice can’t send encrypted messages unless Bob has those settings, but at the same time Alice has no way of secretly passing those settings on to Bob. Is there a way for them to somehow come up with a shared secret key (ie the settings for the machines) by exchanging messages visible to anyone listening to their conversation?
The Diffie–Hellman protocol allows us to do exactly that. Let’s dig in and see how it works!
Some modular arithmetic
We are all used to the regular addition and multiplication of numbers. Indeed no one would ever question that $4 + 11 = 15$. In modular arithmetic, we change the rules slightly, by saying two numbers $a$ and $b$ are the same mod ${p}$, or congruent mod ${p}$, if their difference is an integer multiple of $p$, and we write $a \equiv b \bmod{p}$. For example, since $15 – 1 = 2\times7$, we would write $15 \equiv 1 \bmod{7}$. This is sometimes know as clock arithmetic because we kind of wrap around every time we reach $p$.
Let’s think of what happens when we take the integers mod ${5}$. Observe that any integer will be congruent to one of $0,1,2,3,4$, and that none of $0,1,2,3,4$ are congruent to each other. The set $\{0,1,2,3,4\}$ is therefore enough to “represent” all the integers mod ${5}$, and we call this set $\mathbb{Z}_5$.
Just as with the regular integers, we can add and multiply integers mod ${p}$ together. Of particular interest to us is the multiplication, because it gives an interesting structure to the set. Notice that almost every element in the set $\mathbb{Z}_p$ has what we call a multiplicative inverse. For example in $\mathbb{Z}_5$, $2 \times 3 \equiv 1 \bmod{5} $, so we say that $3$ is the inverse of 2 mod ${5}$ (and vice-versa that $2$ is the inverse of 3 mod ${5}$). Similarly, $4\times 4\equiv 1 \bmod{5}$, so we say that $4$ is its own inverse mod ${5}$.
I did say almost every element, because $0$ never has a multiplicative inverse. This could cause some problems down the road, so we decide to do away with zero entirely, and call the resulting set $\mathbb{Z}_p^*$. Now, $\mathbb{Z}_p^*$ together with multiplication has a lot of structure, and it is in fact an example of a group. We will see another example of a group later on in the article.
There is something special about $\mathbb{Z}_p^*$ that makes it very useful in cryptography. Let’s look at $\mathbb{Z}_7^*$ and enumerate the powers of 3 mod ${7}$. We have:
$3^1 \equiv 3 \bmod{7}$
$3^2 \equiv 2 \bmod{7}$
$3^3 \equiv 6 \bmod{7}$
$3^4 \equiv 4 \bmod{7}$
$3^5 \equiv 5 \bmod{7}$
$3^6 \equiv 1 \bmod{7}$
What do we notice? Every one of $1,2,3,\ldots,6 \bmod{7}$ can be expressed as a power of 3 mod ${7}$! Because of this, we say that $3$ is a primitive root mod ${7}$ and that $\mathbb{Z}_7^*$ is a cyclic group. Note that not every element of $\mathbb{Z}_7^*$ is a primitive root: if we instead looked at powers of 2 mod ${7}$ for example, we would only get half of all possible values. Because $7$ is prime, however, we are guaranteed the existence of primitive roots in $\mathbb{Z}_7^*$. This simple fact will be paramount in making sure our Diffie–Hellman protocol is secure, but before we get there, let me give you a small problem: can you list all the primitive roots mod ${7}$?
It shouldn’t take very long to determine that these are $3$ and $5$. But $7$ is a small prime; what about finding all the primitive roots mod ${23}$?
This is a bit tedious if you try every number up to $23$, but you should find exactly $10$ distinct primitive roots. Notice that $10$ is exactly the number of integers smaller than $22$ which are coprime to $22$. This is not a coincidence! In general, there are exactly $\phi(p-1)$ distinct primitive roots in $\mathbb{Z}_p^*$, where $\phi(n)$ is the number of integers between $1$ and $n$ that are coprime to $n$. Although this doesn’t tell us how to find primitive roots, it does give an upper bound on how many numbers we have to try before finding a primitive root. Computers are reasonably good at finding primitive roots, but the larger the prime the longer it takes. Because of this, we tend to reuse primes whose primitive roots we’ve already computed when doing cryptography.
Discrete logarithms and the Diffie–Hellman key exchange
Now let’s think about another problem: if I give you some $k$, say $k=3$, can you compute $5^k \bmod{7}$? This is not too difficult, and computers are quite fast at computing powers of elements mod ${p}$.
The inverse problem, though, is not so straightforward: given that $5^k \equiv 3 \bmod{7}$, can you find $k$? Yet again, because $7$ is a small prime, you shouldn’t have a hard time figuring out the answer. I invite you to think about your process in solving the problem.
My guess is that you raised $5$ to different powers until you found that $5^5 \equiv 3 \bmod{7}$. What if I told you that $5^k \equiv 48 \bmod{53}$? Can you find $k$ then? Going about it the same way as before, it might take you a while. The general problem of finding $k$ such that $g^{k} \equiv m \bmod{p}$, where $g$ is a primitive root, is known as the discrete logarithm problem, and it is a very difficult problem indeed when $p$ is very large. Hence the function
$$
k \mapsto g^{k} \bmod{p}
$$
is an example of a trapdoor function: it’s easy to compute $g^{k} \bmod{p}$, but hard to find $k$ if you only know $g$ and $g^{k} \bmod{p}$.
We can use this trapdoor function to publicly exchange keys. Let’s see how:
That’s it! I personally find it hard not to be taken aback by the simplicity of the procedure: because Alice knows $n$, she can raise $g^{m}$, which she received from Bob, to the $n$th power and get $g^{nm}$, and similarly for Bob. After an exchange of information taking place entirely in plain sight, Alice and Bob now have a secret key that they can use to encrypt all further communication! This is what made the Diffie–Hellman protocol revolutionary.
To understand why this is secure, let us put ourselves in the shoes of an attacker. In the course of the whole exchange, what do we see? Because the communications are not encrypted, we will know the values $p$, $g$, $g^{n} \bmod{p}$, and $g^{m} \bmod{p}$. To obtain $g^{nm} \bmod{p}$, one essentially needs to solve the discrete logarithm problem to find $n$ or $m$.
Ironically enough, there is no formal proof that the discrete log problem is hard to solve. We do have quite a few algorithms that compute discrete logarithms, but none of them performs particularly well. There is an efficient quantum algorithm due to Peter Shor: it is an algorithm that would run on a quantum computer, which could pose a threat to current systems should practical quantum computers ever hit the market.
But quantum computing aside, Diffie–Hellman is not perfect. As mentioned before, choosing such large primes and finding their primitive roots is not something you want to do every single time you initiate a link, and, in practice, the same primes are reused. This is fine in general, as the discrete logarithm problem is considered hard enough for very large groups, even if those groups are known in advance.
In 2015, however, a group of computer scientists reported that pre-computations were feasible for those primes which we know are used—essentially building a lookup table of the powers of primitive roots for those specific primes. These pre-computations would take a lot of resources, and the estimated cost of such an operation is in the hundreds of millions of dollars, which has been pointed out to be well within the budget of some national agencies.
This vulnerability is known as logjam. The ways to circumvent it are to either use larger primes, or to use a variant of the Diffie–Hellman protocol known as elliptic-curve Diffie–Hellman.
What is an elliptic curve?
An elliptic curve is the solution set of an equation of the form: $y^{2} = x^{3} + ax^{2} + bx + c$. For us, the numbers $a$, $b$, and $c$ will be integers. We also require, as a technicality, that the curve has no cusps and does not cross itself. We have to specify which values of $x$ and $y$ we allow: rational numbers? real numbers? complex numbers? We will denote the set (field) we draw our values from by $k$ and the elliptic curve by $\mathcal{C}(k)$. On the next page, there are some examples of elliptic curves over the real numbers.
Later on, we will be considering curves over finite fields, in particular the $\mathbb{Z}_p$ we talked about earlier. Here’s what our curves look like over those fields:
You might be thinking that these don’t exactly look like curves, but they are: they’re the set of points that satisfy the equation over $\mathbb{Z}_p$. What we’re interested in is the structure of the points on those curves, and in practice we may have lots of questions about those points: how many are there? Is there an effective way of finding all of them?
A theorem of Hasse tells us that in general, there should be about $p+1$ points on an elliptic curve over $\mathbb{Z}_p$, and another result by Mordell and Weil tells us that starting with finitely many points on an elliptic curve, we can generate all the points on the curve. We won’t
understand the proof of this theorem by the end of this article, however it would be nice to understand what it is saying.
How can we generate new points from old points? It turns out that’s not particularly complicated. Take two points $P$ and $Q$ on the elliptic curve. Draw a line through $P$ and $Q$. This line will intersect the curve at some third point, which we name $P*Q$.
This point is on the elliptic curve, so we could stop here, but we go one step further, and for good reasons. Draw a vertical line through $P*Q$. It will intersect the curve at one other point, and this point we name $P+Q$.
If $P$ and $Q$ are the same point, then we do the same thing except that we draw a line tangent to $P$ instead of through $P$ and $Q$ at the beginning.
This new “addition” of points is the first step to giving some structure to the set of points on an elliptic curve.
Let’s make another definition: starting from a point $P$, draw a vertical line through it. Just as before this vertical line intersects the curve at one other point, and we call this point $-P$. Now I want you to add $P$ and $-P$ together. What happens? It seems like the whole thing breaks down: the line through $P$ and $-P$ doesn’t intersect the curve anywhere else, so we don’t get a third point at all. We can circumvent this by thinking of there being an extra point on the elliptic curve, a point which exists at infinity and where all vertical lines meet.
To make this notion clear requires some projective geometry, and this might be a good moment to ask for a leap of faith–it works. To give you some intuition, you could think of our flat plane as being embedded on some very large sphere. Then any two vertical lines we draw will look parallel as seen from within the plane, but if we pull back and see them as being on the sphere, we will see that they all intersect at the north pole of the sphere. Therefore we can think of the north pole as our extra point at infinity, and we call it $\mathcal{O}$. This analogy is very limiting and should really be taken with a grain of salt, but I hope it gives some intuition for what $\mathcal{O}$ is. Now it becomes clear that $P + (-P) = \mathcal{O}$. Furthermore $P + \mathcal{O} = P$, so $\mathcal{O}$ serves as an additive identity.
Lastly, addition of points on elliptic curves defined like this is associative. The bottom line is that the set of points on an elliptic curve with addition is quite similar to the set $\mathbb{Z}_p^*$ with multiplication: it’s a group!
Though the geometric explanation for the group law makes sense, it has its shortcomings: it doesn’t tell us how to compute the coordinates of $P+Q$ in a way a computer could follow, and it doesn’t work when we consider our curve over $\mathbb{Z}_p$. These issues are easily taken care of: since all we’re doing is playing with equations of lines and cubics, we can derive formulae for the coordinates of $P+Q$ in terms of the coordinates of $P$ and $Q$ (I invite the motivated reader to try and find these formulae). Once we have the formulae, we can use them to define addition of points and forget altogether about the previous geometric description.
Elliptic curves over finite fields and our trapdoor function
Let’s actually go through an example of an elliptic curve over a finite field to get a sense of what’s going on. Consider the curve $y^{2} = x^{3} + x$ over $\mathbb{Z}_5$, the integers modulo $5$. We are going to compute all the points on this curve. The way to do this is to plug values of $x$ into the right-hand side and to see if we get a square modulo $5$, in which case we get a solution. For example plugging $x=0$, we get $y^{2}\equiv 0 \bmod{5}$, which has solution $y=0$, so $(0,0)$ is a point on our curve. Plugging $x=1$, we get $y^{2} \equiv 2 \bmod{5}$, which doesn’t have any solutions. Plugging $x=2$, we get $y^{2} \equiv 0 \bmod{5}$, so $(2,0)$ is on our curve.
Continuing like this, we find $(3,0)$ and no other points. We also shouldn’t forget $\mathcal{O}$ (it’s there even over finite fields!). Hence in this case $\mathcal{C}(\mathbb{Z}_5) = \{\mathcal{O}, (0,0), (2,0), (3,0)\}$.
A slightly more interesting example is the curve $y^{2} = x^{3} + 2x + 2$ over $\mathbb{Z}_{17}$. I invite you to check that $P = (3,1)$ is a point on this elliptic curve. If we do some computations (most likely with a computer), we find that $19P = \mathcal{O}$ and that $P, 2P, \ldots, 18P \ne \mathcal{O}$. We say that $P$ has order 19. We also find that $P, 2P,\ldots, 18P$ are all distinct. An interesting question is now: given the point $P$ and a point $kP$ on our elliptic curve, can we find $k$?
The problem is analogous to the discrete logarithm problem, and is called the elliptic curve discrete logarithm problem. Even though it is relatively easy to state, it is very hard to solve, and our methods for tackling it are only marginally better than brute-forcing. This is why we can use smaller prime numbers in the protocol without fear of getting hacked. Choosing a point $G$ on our elliptic curve, our trapdoor function is then $$k \mapsto kG,$$ and we can implement Diffie–Hellman as before:
How much better are elliptic curves?
The whole point of using elliptic curve cryptography instead of ‘classical’ algorithms is that we are able to achieve comparable cryptographic strength with much smaller key sizes. For example, a 228-bit elliptic curve key is equivalent to a 3072-bit Diffie–Hellman key. This means that choosing a 69-digit prime $p$ for our $\mathbb{Z}_p$ in elliptic curve Diffie–Hellman gives us roughly the same security as using a 925-digit prime for regular Diffie–Hellman. As you can imagine, computations are much faster with smaller numbers.
Personally, I think the sheer fact that elliptic curves, which are objects used extensively in pure mathematics, can be impactful in everyday applications is amazing in and of itself!