Computing Discrete Logs

7. Computing Discrete Logs#

Since the security of the Diffie-Hellman key exchange and the ElGamal public-key cryptosystem can be broken by efficiently computing the discrete log, here we discuss a more efficient method for this computation. (In later sections, we describe methods that are even more efficient in special cases.)

In this chapter we then shall use the following notation:

$p$ is a large prime number;
$g \in \mathbb{F}_p^{\times}$ and element of large order, say $N$ (and so $N \mid (p-1)$ by Euler’s Theorem);
$h \in \mathbb{F}_p^{\times}$ such that $g^x = h$ for some $x$ (which can be taken in $\mathbb{Z}/N\mathbb{Z}$).

Hence, we have that the discrete log $\log_g(h)$ exists and is equal to $x$. Here we deal with the question of how we can find this exponent $x$.

7.1. Brute Force Computation#

The most immediate way to compute this discrete log is by “brute force”: we start computing $g^0$, $g^1$, $g^2$, etc., until some power is equal to $h$. In the worst case scenario, we might have to compute $N-1$ products, as the set

\[\left\{ 1, g, g^2, g^3, \ldots, g^{N-1} \right\}\]

has $N$ distinct elements, since $g$ has order $N$.

When $N$ is very large, this can take a considerable amount of time. If $N$ is a $2047$-bit integer, meaning that $2^{2046} \leq N < 2^{2047}$, as recommended for Diffie-Hellman and ElGamal, and (arbitrarily) assuming each product takes one billionth of a second, we can compute how many millennia it would take to perform $N-1$ products:

prod_time = 10^(-9)

seconds = 2^2046 * prod_time
millennia = seconds / (60 * 60 * 24 * 365.25 * 1000)

millennia

2.56016031568552e596

That is more than $2$ followed by $596$ zeros! It is certainly not feasible.

Homework

You will write an implementation of the brute force method in your homework.

7.2. Shank’s Baby-Step/Giant-Step Algorithm#

In 1971, Daniel Shanks devised a collision (or meet-in-the-middle) algorithm to compute discrete logs, usually called Baby-Step/Giant-Step, that is a lot more efficient than the brute force method.

Algorithm 7.1 (Shank’s Baby-Step/Giant-Step)

If $g$ is an element of order $N \geq 2$ in $\mathbb{F}_p^{\times}$ for some prime $p$ and $h \in \mathbb{F}_p^{\times}$. Then, to compute $\log_g(h)$ (i.e., to try to find $x$ such that $h = g^x$), we do:

Let $n = \lfloor \sqrt{N} \rfloor + 1$ (so $n > \sqrt{N}$).
Compute the list of elements of $\mathbb{F}_p^{\times}$: $\{1, g, g^2, g^3, \ldots , g^n\}$ (the baby-steps).
Compute the inverse $g^{-n}$ of $g^n$ (computed in the previous step).
In a loop, start computing $h, h \cdot g^{-n}, h \cdot g^{-2n}, h \cdot g^{-3n}, \ldots, h \cdot g^{-n^2}$ (the giant-steps) until one these elements has a match in the list above.
If we get to the last element $h \cdot g^{-n^2}$ and still found no match, then $\log_g(h)$ does not exist. If we find a match, say $g^i = h \cdot g^{-jn}$, then $\log_g(h) = i + jn$ (in $\mathbb{Z}/m\mathbb{Z}$), i.e., we have that $g^{i+jn} = h$.

7.2.1. Why Does It Work?#

Let’s see why the algorithm works!

First, why do we find a match when $h = g^x$ for some $x$? With $n = \lfloor \sqrt{N} \rfloor + 1$, whatever $x$ might be, we can perform the long division of $x$ by $n$, getting a quotient $q$ and remainder $r$, i.e., we get

\[x = nq + r, \quad \text{with $0 \leq r < n$.}\]

Since $1 \leq x < N$, we have that

\[\begin{split}\begin{align*} q &= \frac{x-r}{n} \\[2ex] &\leq \frac{x}{n} && \text{(since $r \geq 0$)} \\[2ex] &< \frac{N}{n} && \text{(since $N > x$)}\\[2ex] &< \frac{n^2}{n} && \text{(since $n > \sqrt{N}$, so $n^2 > N$)} \\[2ex] &= n, \end{align*}\end{split}\]

i.e., we have that $q<n$. Now

\[h = g^x = g^{nq + r} = g^{nq} \cdot g^r \qquad \Longrightarrow \qquad h \cdot g^{-qn} = g^r, \quad \text{with $0 \leq r, q <n$.}\]

Hence, the left-side is one of the elements we compute in our loop and the right-side is our list of powers of $g$. Even though we do not know $q$ and $r$ (since we do not know $x$), we do know that we must get a match!

Now, when we do find a match $g^i = h \cdot g^{-jn}$, we can solve for $h$ and get $g^{i + jn} = h$, so we know that $i + jn$ (reduced modulo $N$) is $\log_g(h)$.

7.2.2. Number of Operations#

How many multiplications does the Baby-Step/Giant-Step Algorithm need? To compute our list, we need $n$ multiplications by $g$:

\[\begin{split}\begin{align*} g &= 1 \cdot g \\ g^2 &= g \cdot g \\ g^3 &= g^2 \cdot g \\ & \;\;\vdots\\ g^n &= g^{n-1} \cdot g \end{align*}\end{split}\]

Computing the inverse $g^{-n}$ of $g^n$ is fast, since we can use the Extended Euclidean Algorithm.

In our loop, we have at most another $n$ multiplications by this $g^{-n}$:

\[\begin{split}\begin{align*} h \cdot g^{-n} &= h \cdot g^{-n} \\ h \cdot g^{-2n} &= h \cdot g^{-n} \cdot g^{-n} \\ h \cdot g^{-3n} &= h \cdot g^{-2n} \cdot g^{-n} \\ & \;\;\vdots\\ h \cdot g^{-n^2} &= g^{-(n-1)n} \cdot g^{-n} \end{align*}\end{split}\]

So, we’ve computed at most $2n$ multiplications (and inverted $g^n$). Another crucial step is finding, in our loop, if some $h \cdot g^{-jn}$ is in the list $\{1, g, g^2, \ldots, g^{n}\}$. This process involves at most $n$ comparisons for each iteration of our loop, so we would need at most $n^2 \approx N$ comparisons in total. On the other hand, there are better algorithms for finding matches by sorting the list, which would gives us about $n \cdot \log_2(n)$ comparisons at most.

Therefore, in total, we would need $2n \approx 2\sqrt{N}$ multiplications and $n \log_2(n) \approx \sqrt{N} \log_2(\sqrt{N}) = (\sqrt{N}/2) \log_2(N)$ comparisons.

So, how does the number of steps compare? We go from $N$ multiplications to about $2 \sqrt{N}$ multiplications and $(\sqrt{N}/2) \log_2(N)$ comparisons. Since both $\sqrt{N}$ and $\log_2(N)$ are relatively much smaller than $N$ for large $N$, these are great gains! For instance, if $N =1{,}000{,}000$, then $2\sqrt{N} = 2{,}000$ and $(\sqrt{N}/2) \log_2(N) \approx 9{,}965$, so a huge gain!

On the other hand, in the case where $N$ is a $2047$-bit integer, these would still take over $2 \cdot 10^{293}$ millennia, so still not feasible!

Important

Computing Powers

As observed in the section Computing Successive Powers, it is crucial that we do not compute the list of powers above using ^. We should not do something like

powers = []
for i in range(n + 1):
    powers.append(g^i)

or, using list comprehension, with

powers = [g^i for i in range(n + 1)]

Instead, we do something like (but see discussion below for a better suited way for this particular problem):

powers = [Mod(1, p)]
for _ in range(1, n + 1):
    powers.append(g * powers[-1])

7.2.3. Example#

Let’s compare the brute force and baby-step/giant-step methods in a concrete example.

First, here is some information about the computer being used:

!inxi --system --cpu --memory

System:
  Host: dell7010 Kernel: 6.17.7-1-siduction-amd64 arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 6.5.2 Distro: siduction 22.1.2 Masters_of_War -
    kde - (202303220911)
Memory:
  System RAM: total: 128 GiB available: 125.51 GiB used: 29.83 GiB (23.8%)
  Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: None
  Device-1: DIMM1 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
    actual: 3600 MT/s
  Device-2: DIMM2 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
    actual: 3600 MT/s
  Device-3: DIMM3 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
    actual: 3600 MT/s
  Device-4: DIMM4 type: DDR5 size: 32 GiB speed: spec: 4800 MT/s
    actual: 3600 MT/s

CPU:
  Info: 24-core model: 13th Gen Intel Core i9-13900 bits: 64 type: MCP cache:
    L2: 32 MiB
  Speed (MHz): avg: 5301 min/max: 800/5300:5600:4200 cores: 1: 5301 2: 5301
    3: 5301 4: 5301 5: 5301 6: 5301 7: 5301 8: 5301 9: 5301 10: 5301 11: 5301
    12: 5301 13: 5301 14: 5301 15: 5301 16: 5301 17: 5301 18: 5301 19: 5301
    20: 5301 21: 5301 22: 5301 23: 5301 24: 5301

Now, let’s first find $p$ and $g$: let’s take $p=2{,}819$, $N = (p-1) /2 = 1{,}409$ (also prime), and $g = 798$ (in $\mathbb{F}_p^{\times}$).

p = 2819
N = (p - 1) // 2
g = Mod(798, p)

Let’s verify that these work (we should get True, True):

is_prime(p), g.multiplicative_order() == N

(True, True)

Now, take $h = 1{,}945$ (in $\mathbb{F}_p^{\times}$) and let’s try to compute $\log_g(h)$.

h = 1945

7.2.3.1. Brute Force#

First by brute force:

%%time
power_of_g = 1  # first element
for x in range(N):
    if power_of_g == h:
        print(f"Found: {x = }.")
        break
    else:
        power_of_g *= g
else:
    print("No power found!  The log does not exist.")

Found: x = 1231.
CPU times: user 745 µs, sys: 966 µs, total: 1.71 ms
Wall time: 1.72 ms

Let’s check:

h == g^x

True

7.2.3.2. Baby-Step/Giant-Step#

Now, we use Shank’s algorithm. First, we define $n$:

n = floor(sqrt(N)) + 1

Now, let’s think about how we create the list of powers of $g$. In principle, we could just do something like:

powers = [Mod(1, p)]
for _ in range(1, n + 1):
    powers.append(g * powers[-1])

We then can find if something is in the list with in:

if x in powers:
    ...  # do something

But searching in lists is relatively slow. Searching in a set is much faster, since they are hashed. In this case, though, a set is not ideal, since we also need the exponent that gives us the resulting power, i.e., the $i$ in $g^i$.

An alternative to keep track of both the exponent and the resulting power is to use a dictionary as in

powers = {0: Mod(1, p)}
for i in range(1, n + 1):
    powers[i] = g * powers[i-1]

But this way, we still have a problem when searching if $g^i$ is in the dictionary, as we have to search for it among the values, and not the keys! Dictionaries are optimized for searching the keys, so we need to turn things around a bit, swapping the key/value pairs above:

powers = {Mod(1, p): 0}
current_power = Mod(1, p)
for i in range(1, n + 1):
    current_power *= g
    powers[current_power] = i

So, now powers have key/value pairs of the form g^i: i and current_power has $g^n$.

Our next step is to invert $g^n$:

factor = current_power^(-1)

Finally we start computing $h, h \cdot g^{-n}, h \cdot g^{-2n}, \ldots$ until we find a matching power or get to $h \cdot g^{-n^2}$. Note that we can check if an element is a key in the dictionary with in:

current_value = h
for j in range(n + 1):
    if current_value in powers:
        i = powers[current_value]
        x = (i + j * n) % N
        print(f"Found {x = }.")
        break
    current_value *= factor
else:
    print("The log does not exist.")

Found x = 1231.

Let’s double check that it worked:

h == g^x

True

Now, let’s see how long the whole process takes:

%%time
N = g.multiplicative_order()
n = floor(sqrt(N)) + 1

powers = {Mod(1, p): 0}
current_power = Mod(1, p)
for i in range(1, n + 1):
    current_power *= g
    powers[current_power] = i

factor = current_power^(-1)
current_value = h

for j in range(n + 1):
    if current_value in powers:
        i = powers[current_value]
        x = (i + j * n) % N
        print(f"Found {x = }.")
        break
    current_value *= factor
else:
    print("The log does not exist.")

Found x = 1231.
CPU times: user 268 µs, sys: 40 µs, total: 308 µs
Wall time: 306 µs