Memory Lane: How the Dual_EC_DRBG Backdoor Works

How well do you remember 2014? A bit fuzzy? Me too. I’ll try to recall things from nearly 10 years ago as I walk back through old code I wrote to explore and teach others about the fundamentals of the Dual_EC_DRBG backdoor, Which I often call “Dual_EC” for short. What better way to learn than by breaking a toy version of Dual_EC?

The messy code is all here, updated and tested to work under Python 3.13. Just run server.py in one terminal and attack.py in another, this should set up all 4 challenges and then sequentially solve them.

Huge thanks to Jeremy Kun’s blog post and code on doing math with elliptic curves over finite fields, which these challenges couldn’t have happened without.

Disclaimer: I am not a cryptographer, and my math is a bit rusy. These challenges are flawed in the intended ways, and probably also unintented ways.

Context#

Back in 2014 the Snowden leaks were still fresh. Among the revelations was that the NSA had a program codenamed “BULLRUN”¹ that involved intentionally undermining encryption standards, explicitly the Dual_EC_DRBG² algorithm. It was already a highly suspicious algorithm to people who pay attention to such things, but hard evidence from one of the largest leaks of secret documents in history does wonders for awareness. The Citizenfour documentary is a popcorn-friendly look at the leaks, if you want more high-level background and don’t feel like reading.

I wasn’t following this closer than any other techie. But when I came across a paper in the Notices of the AMS, “The NSA Back Door to NIST”³ by Thomas C. Hales, I realized the math behind the backdoor wasn’t as complex as I first thought. Maybe I could show this to other people with less math experience and demystify this backdoor a bit? Also, what better than learning by doing, and having a series of challenges to demonstrate each building block in the backdoor?

Technical Background#

The Dual_EC_DRBG algorithm is meant to be used as a cryptographically secure pseudo-random number generator (CSPRNG). Every cryptographic scheme requires a source of random bits to function. The Dual_EC_DRBG algorithm can supply those bits. These pseudo-random number generators need to be cryptographically secure to be suitable for cryptographic schemes – you should not be able to predict future output by observing previous output. If an attacker can predict output of the random number generator (RNG) used by a cryptographic scheme, they can break that scheme. This makes the RNG an attractive target for a backdoor. You just have to – ideally – make it a backdoor only usable by you and nobody else.

Dual_EC_DRBG uses elliptic curves, which mostly just matters for the details of the algorithm and the backdoor. There are great intros to Elliptic Curve Cryptography⁴⁵ so I won’t repeat what those guides have already done. The important part for us is that points on the curve – pairs of numbers (x, y) where x and y are solutions for the curve’s equation – form a Group⁶. This allows us to define very “basic” math among these points (addition and subtraction) and have it work overall like we expect addition and subtraction to work. For us, the important thing is that given points P and Q on a curve, P + Q gives us a third point R on the curve. Knowing this, we can define a “scalar multiplication” in this group as repeated addition. So for some integer n and point P, n * P is just P added to itself n times: n * P = P + P + ... + P. We’ll be doing that a lot.

The Algorithm#

In very general terms, Dual_EC_DRBG maintains some internal state s that is used to derive random bits that are output to the user. As more bits are needed, the internal state is mutated, and more output bits can be derived from the new internal state. Repeat as needed. The fact that we’re using an elliptic curve is only important for the details. Here is the core of it as described by Thomas C. Hales in the paper³ mentioned in the context section. In math convention, s' is pronounced “s prime”, and usually denotes something derived from or closely related to another variable, in this case s.

let pseudo_random s =
  let r = x1 (s * P) in
  let s' = x1 (r * P) in
  let t = x1 (r * Q) in
  let b = extract_bits t in
    (s',b);

The x1 function just takes the x-coordinate from the given point, as a simple way of getting an integer from a point on the curve. extract_bits just masks its input, throwing away some of the most significant bits and only giving the user what’s left. An ASCII diagram might help make this clearer. The internal states s, s', and so on are entirely internal to the RNG, and the user only sees bits b derived from these states. Which means all an external attacker could hope to see would be the bits b themselves, or things derived from them.

s --P--> r --P--> s'
         |
         '---Q--> b

s' = x1( x1(s * P) * P )
t  = x1( x1(s * P) * Q )
b  = extract_bits( t )

Note that the internal state is mutated by multiplications against point P, while the random bits we observe also involve multiplication by a point Q. If there was some way to go from b back to r we could break the algorithm. But this would require solving for r in X = r * Q, which is the Discrete Logarithm Problem⁷ which in these circumstances is considered difficult. This problem is the basis of many other cryptographic schemes’ security, so that’s probably not our path forward.

The Backdoor#

For an attacker, everything⁸ about the random number generator is known except for the internal state. If we could somehow recover the internal state, we could simply set up our own Dual_EC_DRBG with that identical state to predict all future output of the targeted Dual_EC instance.

That’s a great goal, but how does the backdoor get us there? I’ll quote extensively from Thomas C. Hales who does an amazing job explaining this to mathematicians. Emphasis and formatting tweaks are mine:

The NIST standard gives a list of explicit mathematical data (E,p,n,f,P,Q) to be used for pseudo-random number generation … The group E(Fp) has order n, which is prime for all of the curves that occur in the NIST standard. … Finally, P and Q are given points on the affine curve. … No explanation is given of the particular choices (E,p,n,f,P,Q). … The standard stipulates that “one of the following NIST approved curves with associated points shall be used in applications requiring certification under FIPS-140 [U.S. government computer security accreditation].”

So we have certification and standardization being used to force adoption and use of DUAL_EC_DRBG with the specific constants that were chosen by the NSA with no explanation. I emphasized the fact that the order of our group – the number of elements it has – is prime because this tips the hand of the backdoor and these magic constants P and Q.

Since P and Q are nonidentity elements of a cyclic group of prime order, each is a multiple of the other. Write P = e*Q for some integer e. We show that, once we have e in hand, it is a simple matter to determine the secret internal state s of the pseudo-random bit generator by observing the output b and thus to compromise the entire system. The function extract_bits discards 16 bits. Given the output b, we take the 2^16 (a small number of) possible preimages t of b under extract_bits. For each t, the coordinate x is known, and solving a quadratic, there are at most two possibilities for the coordinate y of a point A on the elliptic curve such that t = x1(A). One such A is r * Q. For each A, we compute e * A. One of the small number of possibilities for e * A is

(1) e * (r * Q) = r * (e * Q) = r * P

Finally s′ = x1(r * P).

Closer to plain english:

As an outside observer, we just get output b. The output had 16 bits thrown away that we don’t get to see. But 2^16 = 65536, which just isn’t that large of a number for computers. So write down our 65536 possibilities for t and one of them must be correct, we just don’t know which one yet.

Next, the correct t was the x-coordinate for some point. Let’s call that point A. Because of how these elliptic curves work, we can start with x, and solve the equation for two values of y. One of these is the correct y for this point A. So for each of our t guesses, it’s easy to solve the curve’s equation to find possible points A1 and A2. One of these has to be r * Q. 131072 guesses now, still not many.

Because of the properties of these groups, we know there must exist some integer e such that P = e*Q for the points P, Q on the standard’s curves. If we’re a lowly member of the public, we only know P and Q, and finding this e is solving the Discrete Logarithm Problem⁷, which is not known to be feasible in this situation. However if we’re the NSA – or any other outfit that could pick the points P and Q that people use Dual_EC with – then choosing e, P and Q such that P = e*Q is a cakewalk.

Let’s assume we know e.

One of our 131072 points A is the correct one, where A = r*Q. Taking e, and the correct A, we can do e * A = e * (r * Q) and based on equation (1) from the paper, we know that this is r * P⁹, which gives us the target Dual_EC’s current state s'. So do e * A for all of the guesses we have, use each resulting state guess s' to instantiate a different instance of Dual_EC, and one of these will be synchronized with the target Dual_EC. Broken¹⁰.

The Challenges#

Making CTF-style challenges is tricky. They must be intentionally vulnerable and exploitable, but only in the intended way, so you also have to avoid all other vulnerabilities (good luck!). Also, in my opinion they should lead the attacker towards the intended solution so people don’t waste time getting frustrated with other unrelated rabbitholes.

I decided a guessing game with an open-source network server worked best. How the server worked shouldn’t be a mystery for this, and the code would also be reusable by attackers so they wouldn’t have to waste time re-implementing Dual_EC or searching for libraries for doing math on elliptic curves over finite fields.

The only interaction possible was over a network socket, and Python should make simple network communication relatively secure. JSON is used so that network traffic is human-readable and serializing/de-serializing is handled. Only hardcoded elements of the object from the attacker are accessed. SSL was used so teams would at least need to use ARP poisoning (if possible) and some more active interception to steal the flag from other teams, and the cert was available so attackers could hardcode cert checking.

The server gives players output of its random number generator, and accepts a guess for the next number. If the challenger can make 50 correct guesses, the server starts sending the flag. They win! Only a few bad guesses and the player gets disconnected. They have to try again, with a freshly reseeded RNG to guess against. This should make it only possible to win by actually breaking the PRNG.

There’s the issue of knowing e however, and the groups used in the standardized curves are all too large to brute-force this from P and Q. We make this easier by just drastically reducing the group size so that anyone – even someone with a netbook from 2009 – can brute-force to find e. The challenge is supposed to test people’s knowledge of the backdoor, not their wallet.

Throwing even most cybersecurity professionals at a weakened Dual_EC and expecting them to break it isn’t fair, the elliptic curve math is still pretty specialized. We’ll break the full problem down into more digestible steps. This also gives us multiple related challenges with a difficulty progression to pad out the challenge count and hopefully give people at least one achievable challenge.

Level 0#

For devoping & testing the rest of the challenge design besides the PRNG parts, we use a dummy PRNG that always returns 42. I don’t think this was live during the actual CTF, but it’s in the code.

self.prng = type("",(),dict(get_num=lambda: 42))

Level 1#

We use a similar design as Dual_EC, but in a simpler finite field¹¹ and without output masking. This should put the focus on seeing where doing math with the given P and Q in the given finite field gets us. This challenge uses the arbitrarily chosen GF(104729) with P = 7 and Q = 2. 104729 is prime, but the choice of P and Q with that prime was based on vibes and basic trial-and-error that the PRNG itself had a “long enough” run before repeating a value. If I were a cryptographer I could’ve picked all this more intelligently, but I’m not.

Apparently an oversight on my part, but this weak PRNG uses multiplication and not addition. The finite field also gives us more flexibility, and the hardcoded way I have for cracking the PRNG involves using multiplicative inverses in a way that doesn’t directly match what’s used for Dual_EC. Oops!¹²

def get_num(self):
	r = self.state * self.P
	self.state = r * self.P
	t = r * self.Q
	return t.n

The t.n is just us getting a Python integer from the finite field element t so we can do typical arithmetic with it, rather than more limited finite field arithmetic. For attacking, we have this:

class Guesser:
	def __init__(self, output):
		F = FiniteField(104729,1) #use the same finite field at the prng
		P = F(7) #same parameters
		Q = F(2)
		PQ_inverse = (P*Q).inverse() #the trick! we have multiplicative inverses!
		T = F(output)
		oldstate = T*PQ_inverse
		curstate = oldstate*P*P
		self.p = prng(seed=curstate.n)

	def guess(self):
		return self.p.get_num()

This method doesn’t exactly match Dual_EC. Instead of finding the current state s' we can actually go back even farther and get s, which we then fast-forward to s'. We could’ve followed Dual_EC more closely by just finding the inverse for Q, but nothing really forces us to brute-force an e to solve P = e * Q.

Level 2#

Same field, same algorithm, same ‘points’ P and Q. However this time we add in some bit masking.

def get_num(self):
	r = self.state * self.P
	self.state = r * self.P
	t = r * self.Q
	return t.n & 0x7fff # throw 2 leading bits away

The math for breaking this is identical, but now we have to do some guessing to handle the bits that were thrown out before we got the output. Our program and challenge interaction get more complex

class Guesser:
	def __init__(self, output):
		F = FiniteField(104729,1)
		P = F(7)
		Q = F(2)
		PQ_inverse = (P*Q).inverse()
		#recover 'potential' outputs
		outputs = [(i<<15) | output for i in range(4)]
		Ts = [F(o) for o in outputs]
		oldstates = [T*PQ_inverse for T in Ts]
		print("old states:",oldstates)
		curstates = [o*P*P for o in oldstates]
		self.candidates = [prng(seed=c.n) for c in curstates]
		self.refine_step = 0

	def guess(self):
		print("generating number")
		return self.p.get_num()

	def refine(self, num):
		"""
		based on next output (num)
		determine which candidate prng
		is correct.
		"""
		for i in range(len(self.candidates)):
			if self.candidates[i].get_num() == num:
				print("old state found!")
				self.p = self.candidates[i]

Basically we make a number of “candidate” PRNGs like in Level 1. One of them is correct, but we have to find out which. We just make a wrong guess to get another “random” number from the server, then use that for a refinement step to determine which candidate is correct.

Level 3#

Here’s where we move to elliptic curves, and not being a cryptographer really shows. Solely because it’s a cool prime that’s about as large as I want, I chose a prime of 331337 for our underlying field, and the curve y^2 = x^3 + x + 1. The cycle this gives our RNG, based on the comment, is really short but probably long enough to work for the challenge. I probably found Q by picking random points until I found one that generated a decently sized group, then just randomly picked an e to generate P by computing e * Q.

class prng:
	def __init__(self,n=331337,seed=42):
		#don't need to keep field or curve beyond constructor
		F = FiniteField(n,1)
		C = EllipticCurve(a=F(1),b=F(1)) #choice of curve assumes default
		self.state = F(seed).n
		#this gives a 'cycle' of 71 elements...
		self.Q = Point(C,F(153116),F(171795)) # <Q> has order 6257
		self.P = Point(C,F(285710),F(143307))

	def get_num(self):
		r = (self.state * self.P).x.n
		self.state = (r * self.P).x.n
		t = (r * self.Q).x.n
		return t #define SO LONG AS we're in integers mod P

It’s all just good enough to get the point across, but actually knowing the cryptography at play would’ve made it easier to pick a curve and points that would work properly. I don’t actually know if the set of all points on this curve is of prime order, but by choosing P and Q as I have, I at least restrict us to a prime order cyclic subgroup that we know would have P = e * Q, and is small enough to brute-force e.

class Guesser:
	def __init__(self, out):
		"""
		given the initial output,
		init out guesser so we can guess
		all remaining outputs
		"""
		prime=331337
		F = FiniteField(prime,1)
		C = EllipticCurve(a=F(1),b=F(1))
		e = F(3) #backdoor! we'd have to pre-compute this
		val = out*out*out + C.a * out + C.b
		print(time.time(),":","finding points...")
		points = [Point(C,F(out),F(y)) for y in tonelli_shanks(val.n,prime)]
		#print("points: ",points)
		print(time.time(),":","recovering states...")
		states = [(e.n*T).x.n for T in points]
		#as both candidates are additive inverses of
		#one another, they have the same x coordinates
		print(time.time(),":","making prng")
		self.p = prng(seed=states[0])

	def guess(self):
		"""
		by the nature of level 3, we
		don't have to 'modify' our guesses.
		we're guaranteed to get precisely
		the right prng
		"""
		return self.p.get_num()

Aha, the backdoor e for this situation is apparently 3. This relationship between P and Q would be figured out some other way then inserted into this attack script. Once someone started looking for e to do P = e * Q it wouldn’t have taken long. Because the output is the x-coordinate and we know the curve equation, we can easily get to y^2 = n for some number n. Normally you’d just take the square root here, but we’re in a finite field, how would we even do that?

Luckily the Tonelli-Shanks Algorithm¹³ can do this because we’re working in a field of prime order. This is included in the prng library used in the challenge, and is used in a find_point function that returns a point on the curve C that has the given x-coordinate. The player doesn’t have to know about this beforehand or implement it themselves, assuming they read the code for the challenge. This is also very helpful for returning to these challenges nearly 10 years later, I’d completely forgotten about Tonelli-Shanks!

Level 4#

Ok, full strength now! Well, except we’re still in our small and brute-forceable subgroup of points generated by Q. Like in the jump from 1 to 2, we just add in bit masking.

def get_num(self):
	r = (self.state * self.P).x.n
	self.state = (r * self.P).x.n
	t = (r * self.Q).x.n
	return t & (0x7fff)

Which means that the attack from 3 just has to add in generating multiple PRNGs and a refinement step that determines which one is the correct guess and synchronized with the target.

class Guesser:
	def __init__(self, out):
		"""
		given the initial output,
		init out guesser so we can guess
		all remaining outputs
		"""
		prime=331337
		F = FiniteField(prime,1)
		C = EllipticCurve(a=F(1),b=F(1))
		e = F(3) #backdoor! we'd have to pre-compute this
		xs = [(i<<15) | out for i in range(16)]
		print("xs:",xs)
		vals = [x*x*x + C.a * x + C.b for x in xs]
		print("vals:",vals)
		print(time.time(),":","finding roots...")
		# at this point, *some* vals won't be quadratic
		# residues, thus invalid points. But, we must
		# associate each possible preimage with its root
		coords = [] # list of tuples (x,y)
		for i in range(len(xs)):
			try:
				t = tonelli_shanks(vals[i].n,prime)
				coords.append( (xs[i], t[0]) )
				coords.append( (xs[i], t[1]) )
			except Exception:
				# not quadratic residue
				pass

		print("coords:",coords)
		print(time.time(),":","making points...")
		points = [Point(C,F(c[0]),F(c[1])) for c in coords]
		print(time.time(),":","recovering states...")
		states = [(e.n*T).x.n for T in points]
		print("states:",states)
		print(time.time(),":","generating candidates...")
		self.candidates = [prng(seed=s) for s in states]

	def guess(self):
		return self.p.get_num()

	def refine(self, num):
		"""
		given the next output, refine our
		candidates list to only the correct one
		"""
		print("refining...")
		for i in range(len(self.candidates)):
			if self.candidates[i].get_num() == num:
				self.p = self.candidates[i]
				print("successfully refined!")
				break

Conclusion#

Wow! We just broke our own toy version of Dual_EC. Sure it was smaller and simpler, but we dealt with the bit masking, the point arithmetic, and even the extra – and for actual Dual_EC, unfeasible – step of brute-forcing the backdoor e for ourselves. It does take a bit of a dive into math world to grasp, but hopefully with some code and comments to guide the way, it can be less of a mystery to someone than it was before.

https://en.wikipedia.org/wiki/Bullrun_(decryption_program) ↩︎
https://en.wikipedia.org/wiki/Dual_EC_DRBG ↩︎
http://www.ams.org/notices/201402/rnoti-p190.pdf ↩︎ ↩︎
https://blog.cloudflare.com/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/ ↩︎
https://www.jeremykun.com/2014/02/08/introducing-elliptic-curves/ ↩︎
A group is a specific algebraic structure. In simple terms, we have some elements (numbers, points, objects …) that are members of the group, and an addition operation + that works between them. We have an identity element that works like zero, and each element has an inverse. A + -A = 0 and A + 0 = A. For points on an elliptic curve, the trick is defining a + operation so that P + Q makes sense and follows these group rules. The other Elliptic Curve intros get into all that. ↩︎
https://en.wikipedia.org/wiki/Discrete_logarithm#Cryptography ↩︎ ↩︎
the process for deriving random bits and mutating internal state, and the approved elliptic curves and points P and Q on those curves are public as part of the standard. It would generally be safe to assume that an in-the-wild instance of Dual_EC_DRBG is one of these public standardized versions, rather than something on a completely different custom curve, or using different points P and Q. ↩︎

Because our scalar multiplication is just repeated addition, we can do this

e * (r * Q) = e * (Q + Q + ... + Q) //(r many Qs in parens)
	    = Q + Q + ... + Q       //(e*r = r*e many Qs)
	    = r * (Q + Q + ... + Q) //(e many Qs in parens)
	    = r * (e * Q)

It’s basically just mentally re-grouping the Qs in the expression. ↩︎

Broken until the target Dual_EC gets reseeded, that is. But once we have a method worked out to synchronize to a target Dual_EC by observing a few bytes of output, this might not be a huge deal. ↩︎
https://en.wikipedia.org/wiki/Finite_field

Doing math in a finite field seems intimidating at first, but just involves taking remainders when you do arithmetic. You might know this as the modulus operator mod. For a simple example in the finite field of order 5 GF(5), we have elements {0,1,2,3,4}. 1 + 3 = 4 and 4 / 5 = 0 remainder 4, so that works as expected. But 3 + 4 = 7 and 7 isn’t in our group, which isn’t allowed. But 7 / 5 = 1 remainder 2, so our actual answer in GF(5) is 3 + 4 = 2. Using mod this is 3 + 4 = 7 mod 5 = 2.

Multiplication also works! That’s what makes this a field and not just a group. We just ignore the element 0. 2 * 3 = 6 mod 5 = 1. ↩︎
In this finite field there is some number of times Q can be added to itself to get P, so the P = e * Q trick would still work, if addition were used rather than multiplication. Because the field is of order 104729, it’s small enough to easily brute-force e. ↩︎
https://en.wikipedia.org/wiki/Tonelli%E2%80%93Shanks_algorithm

This uses number theory lingo, but basically if p is prime, you know n, and you want to find an x such that x^2 = n (mod p), this algorithm can find x. But p must be prime. ↩︎