7. Computing Discrete Logs#
Since the security of the Diffie-Hellman key exchange and the ElGamal public-key cryptosystem can be broken by efficiently computing the discrete log, here we discuss a more efficient method for this computation. (In later sections, we describe methods that are even more efficient in special cases.)
In this chapter we then shall use the following notation:
\(p\) is a large prime number;
\(g \in \mathbb{F}_p^{\times}\) and element of large order, say \(N\) (and so \(N \mid (p-1)\) by Euler’s Theorem);
\(h \in \mathbb{F}_p^{\times}\) such that \(g^x = h\) for some \(x\) (which can be taken in \(\mathbb{Z}/N\mathbb{Z}\)).
Hence, we have that the discrete log \(\log_g(h)\) exists and is equal to \(x\). Here we deal with the question of how we can find this exponent \(x\).
7.1. Brute Force Computation#
The most immediate way to compute this discrete log is by “brute force”: we start computing \(g^0\), \(g^1\), \(g^2\), etc., until some power is equal to \(h\). In the worst case scenario, we might have to compute \(N-1\) products, as the set
has \(N\) distinct elements, since \(g\) has order \(N\).
When \(N\) is very large, this can take a considerable amount of time. If \(N\) is a \(2047\)-bit integer, meaning that \(2^{2046} \leq N < 2^{2047}\), as recommended for Diffie-Hellman and ElGamal, and (arbitrarily) assuming each product takes one billionth of a second, we can compute how many millennia it would take to perform \(N-1\) products:
prod_time = 10^(-9)
seconds = 2^2046 * prod_time
millennia = seconds / (60 * 60 * 24 * 365.25 * 1000)
millennia
2.56016031568552e596
That is more than \(2\) followed by \(596\) zeros! It is certainly not feasible.
Homework
You will write an implementation of the brute force method in your homework.
7.2. Shank’s Baby-Step/Giant-Step Algorithm#
In 1971, Daniel Shanks devised a collision (or meet-in-the-middle) algorithm to compute discrete logs, usually called Baby-Step/Giant-Step, that is a lot more efficient than the brute force method.
Algorithm 7.1 (Shank’s Baby-Step/Giant-Step)
If \(g\) is an element of order \(N \geq 2\) in \(\mathbb{F}_p^{\times}\) for some prime \(p\) and \(h \in \mathbb{F}_p^{\times}\). Then, to compute \(\log_g(h)\) (i.e., to try to find \(x\) such that \(h = g^x\)), we do:
Let \(n = \lfloor \sqrt{N} \rfloor + 1\) (so \(n > \sqrt{N}\)).
Compute the list of elements of \(\mathbb{F}_p^{\times}\): \(\{1, g, g^2, g^3, \ldots , g^n\}\) (the baby-steps).
Compute the inverse \(g^{-n}\) of \(g^n\) (computed in the previous step).
In a loop, start computing \(h, h \cdot g^{-n}, h \cdot g^{-2n}, h \cdot g^{-3n}, \ldots, h \cdot g^{-n^2}\) (the giant-steps) until one these elements has a match in the list above.
If we get to the last element \(h \cdot g^{-n^2}\) and still found no match, then \(\log_g(h)\) does not exist. If we find a match, say \(g^i = h \cdot g^{-jn}\), then \(\log_g(h) = i + jn\) (in \(\mathbb{Z}/m\mathbb{Z}\)), i.e., we have that \(g^{i+jn} = h\).
7.2.1. Why Does It Work?#
Let’s see why the algorithm works!
First, why do we find a match when \(h = g^x\) for some \(x\)? With \(n = \lfloor \sqrt{N} \rfloor + 1\), whatever \(x\) might be, we can perform the long division of \(x\) by \(n\), getting a quotient \(q\) and remainder \(r\), i.e., we get
Since \(1 \leq x < N\), we have that
i.e., we have that \(q<n\). Now
Hence, the left-side is one of the elements we compute in our loop and the right-side is our list of powers of \(g\). Even though we do not know \(q\) and \(r\) (since we do not know \(x\)), we do know that we must get a match!
Now, when we do find a match \(g^i = h \cdot g^{-jn}\), we can solve for \(h\) and get \(g^{i + jn} = h\), so we know that \(i + jn\) (reduced modulo \(N\)) is \(\log_g(h)\).
7.2.2. Number of Operations#
How many multiplications does the Baby-Step/Giant-Step Algorithm need? To compute our list, we need \(n\) multiplications by \(g\):
Computing the inverse \(g^{-n}\) of \(g^n\) is fast, since we can use the Extended Euclidean Algorithm.
In our loop, we have at most another \(n\) multiplications by this \(g^{-n}\):
So, we’ve computed at most \(2n\) multiplications (and inverted \(g^n\)). Another crucial step is finding, in our loop, if some \(h \cdot g^{-jn}\) is in the list \(\{1, g, g^2, \ldots, g^{n}\}\). This process involves at most \(n\) comparisons for each iteration of our loop, so we would need at most \(n^2 \approx N\) comparisons in total. On the other hand, there are better algorithms for finding matches by sorting the list, which would gives us about \(n \cdot \log_2(n)\) comparisons at most.
Therefore, in total, we would need \(2n \approx 2\sqrt{N}\) multiplications and \(n \log_2(n) \approx \sqrt{N} \log_2(\sqrt{N}) = (\sqrt{N}/2) \log_2(N)\) comparisons.
So, how does the number of steps compare? We go from \(N\) multiplications to about \(2 \sqrt{N}\) multiplications and \((\sqrt{N}/2) \log_2(N)\) comparisons. Since both \(\sqrt{N}\) and \(\log_2(N)\) are relatively much smaller than \(N\) for large \(N\), these are great gains! For instance, if \(N =1{,}000{,}000\), then \(2\sqrt{N} = 2{,}000\) and \((\sqrt{N}/2) \log_2(N) \approx 9{,}965\), so a huge gain!
On the other hand, in the case where \(N\) is a \(2047\)-bit integer, these would still take over \(2 \cdot 10^{293}\) millennia, so still not feasible!
Important
Computing Powers
As observed in the section Computing Successive Powers, it is crucial that we do not compute the list of powers above using ^. We should not do something like
powers = []
for i in range(n + 1):
powers.append(g^i)
or, using list comprehension, with
powers = [g^i for i in range(n + 1)]
Instead, we do something like (but see discussion below for a better suited way for this particular problem):
powers = [Mod(1, p)]
for _ in range(1, n + 1):
powers.append(g * powers[-1])
7.2.3. Example#
Let’s compare the brute force and baby-step/giant-step methods in a concrete example.
First, here is some information about the computer being used:
!inxi --system --cpu --memory
System:
Host: dell7010 Kernel: 6.17.7-1-siduction-amd64 arch: x86_64 bits: 64
Desktop: KDE Plasma v: 6.5.2 Distro: siduction 22.1.2 Masters_of_War -
kde - (202303220911)
Memory:
System RAM: total: 128 GiB available: 125.51 GiB used: 29.83 GiB (23.8%)
Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: None
Device-1: DIMM1 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
actual: 3600 MT/s
Device-2: DIMM2 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
actual: 3600 MT/s
Device-3: DIMM3 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
actual: 3600 MT/s
Device-4: DIMM4 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
actual: 3600 MT/s
CPU:
Info: 24-core model: 13th Gen Intel Core i9-13900 bits: 64 type: MCP cache:
L2: 32 MiB
Speed (MHz): avg: 5301 min/max: 800/5300:5600:4200 cores: 1: 5301 2: 5301
3: 5301 4: 5301 5: 5301 6: 5301 7: 5301 8: 5301 9: 5301 10: 5301 11: 5301
12: 5301 13: 5301 14: 5301 15: 5301 16: 5301 17: 5301 18: 5301 19: 5301
20: 5301 21: 5301 22: 5301 23: 5301 24: 5301
Now, let’s first find \(p\) and \(g\): let’s take \(p=2{,}819\), \(N = (p-1) /2 = 1{,}409\) (also prime), and \(g = 798\) (in \(\mathbb{F}_p^{\times}\)).
p = 2819
N = (p - 1) // 2
g = Mod(798, p)
Let’s verify that these work (we should get True, True):
is_prime(p), g.multiplicative_order() == N
(True, True)
Now, take \(h = 1{,}945\) (in \(\mathbb{F}_p^{\times}\)) and let’s try to compute \(\log_g(h)\).
h = 1945
7.2.3.1. Brute Force#
First by brute force:
%%time
power_of_g = 1 # first element
for x in range(N):
if power_of_g == h:
print(f"Found: {x = }.")
break
else:
power_of_g *= g
else:
print("No power found! The log does not exist.")
Found: x = 1231.
CPU times: user 745 µs, sys: 966 µs, total: 1.71 ms
Wall time: 1.72 ms
Let’s check:
h == g^x
True
7.2.3.2. Baby-Step/Giant-Step#
Now, we use Shank’s algorithm. First, we define \(n\):
n = floor(sqrt(N)) + 1
Now, let’s think about how we create the list of powers of \(g\). In principle, we could just do something like:
powers = [Mod(1, p)]
for _ in range(1, n + 1):
powers.append(g * powers[-1])
We then can find if something is in the list with in:
if x in powers:
... # do something
But searching in lists is relatively slow. Searching in a set is much faster, since they are hashed. In this case, though, a set is not ideal, since we also need the exponent that gives us the resulting power, i.e., the \(i\) in \(g^i\).
An alternative to keep track of both the exponent and the resulting power is to use a dictionary as in
powers = {0: Mod(1, p)}
for i in range(1, n + 1):
powers[i] = g * powers[i-1]
But this way, we still have a problem when searching if \(g^i\) is in the dictionary, as we have to search for it among the values, and not the keys! Dictionaries are optimized for searching the keys, so we need to turn things around a bit, swapping the key/value pairs above:
powers = {Mod(1, p): 0}
current_power = Mod(1, p)
for i in range(1, n + 1):
current_power *= g
powers[current_power] = i
So, now powers have key/value pairs of the form g^i: i and current_power has \(g^n\).
Our next step is to invert \(g^n\):
factor = current_power^(-1)
Finally we start computing \(h, h \cdot g^{-n}, h \cdot g^{-2n}, \ldots\) until we find a matching power or get to \(h \cdot g^{-n^2}\). Note that we can check if an element is a key in the dictionary with in:
current_value = h
for j in range(n + 1):
if current_value in powers:
i = powers[current_value]
x = (i + j * n) % N
print(f"Found {x = }.")
break
current_value *= factor
else:
print("The log does not exist.")
Found x = 1231.
Let’s double check that it worked:
h == g^x
True
Now, let’s see how long the whole process takes:
%%time
N = g.multiplicative_order()
n = floor(sqrt(N)) + 1
powers = {Mod(1, p): 0}
current_power = Mod(1, p)
for i in range(1, n + 1):
current_power *= g
powers[current_power] = i
factor = current_power^(-1)
current_value = h
for j in range(n + 1):
if current_value in powers:
i = powers[current_value]
x = (i + j * n) % N
print(f"Found {x = }.")
break
current_value *= factor
else:
print("The log does not exist.")
Found x = 1231.
CPU times: user 268 µs, sys: 40 µs, total: 308 µs
Wall time: 306 µs
Homework
In your homework you will write a discrete log function that takes some \(g\) and \(h\) as arguments and finds \(x\) such that \(h = g^x\), if it exists, and returns \(-1\) if it doesn’t. Most of the work is actually done above, you just have to adapt it to a function.