-- File: Euclid.hs
-- Author: Keith Schwarz (htiek@cs.stanford.edu)
--
-- An implementation of Euclid's GCD algorithm, along with variants and
-- applications, including
--
-- 1. A function for computing the continued fraction representation of the
-- rational number a / b.
-- 2. The extended Euclidean algorithm, which given a and b produces constants
-- x and y such that ax + by = gcd(a, b)
module Euclid where
-- The core Euclidean algorithm accepts as input integers a and b and returns
-- gcd(a, b), largest number that divides both a and b. The idea behind the
-- algorithm is to compute the remainder of a divided by b and then to use the
-- identity
--
-- gcd(a, b) = gcd(b, a mod b)
--
-- The proof of this is that if we write a = qb + r using the division
-- algorithm, we have that any number that divides both a and b must also
-- divide r, since a number that divides b must be equal to zero mod b (and so
-- if it divides qb + r, it must also divide r). The largest number that
-- divides both b and r is therefore the largest number that divides both a and
-- b.
--
-- As a base case, we can stop when b = 0 by terminating with a, since this
-- will be the largest number dividing both a and 0. This gives a very natural
-- definition of Euclid's algorithm, which is shown below. Note that the
-- algorithm first checks the signs of a and b to ensure that they're both
-- positive, since otherwise we may produce a negative GCD (for example, if we
-- give as input -3 and 0).
--
-- One of the fascinating results about Euclid's algorithm is that it always
-- terminates in O(lg b) steps. This is based on the remarkable fact that the
-- worst-case inputs to Euclid's algorithm are consecutive Fibonacci numbers.
-- We prove this rigorously here.
--
-- Lemma: For any k > 1, gcd(F(k + 1), F(k)) = 1.
-- Proof: As a base case, we have that when k = 2,
--
-- gcd(F(3), F(2)) = gcd(2, 1) = 1
--
-- Now, for the inductive step, assume that for some k > 1 the claim holds and
-- consider k + 1. We want to compute gcd(F(k + 2), F(k + 1)). Assume for the
-- sake of contradiction that there is some number other than 1 and -1 that
-- divides both F(k + 2) and F(k + 1). This means that there is a number that
-- divides both F(k + 2) = F(k + 1) + F(k) and F(k + 1). This number must
-- therefore divide both F(k) and F(k + 1). But the GCD of F(k) and F(k + 1)
-- is 1, a contradiction. Our assumption must have been wrong, so the only
-- numbers dividing both F(k + 1) and F(k + 2) must be 1 and -1, completing the
-- induction.
--
-- Lemma: For k > 1, computing gcd(F(k+1), F(k)) terminates after k - 1
-- recursive calls.
--
-- Proof: As a base case, if k = 2, we have that
--
-- gcd(F(3), F(2)) = gcd(2, 1)
-- = gcd(1, 0)
-- = 1
--
-- And we make one recursive call, as expected. For the inductive step, if the
-- claim holds for some k, consider gcd(F(k+2), F(k+1)). For k > 1, we have
-- that k + 1 > 2 and k + 2 > 3, so F(k+2) > F(3) = 2 and F(k+1) > F(2) = 1.
-- Thus both F(k + 2) and F(k + 1) must be greater than one. Since their GCD
-- is one, we know that neither is a multiple of the other. Thus after doing
-- one step of the algorithm, we will compute F(k + 2) mod F(k + 1). Since
-- F(k + 2) = F(k + 1) + F(k), this gives us F(k), and so we then try to
-- compute gcd(F(k + 1), F(k)), which by the inductive hypothesis terminates in
-- k - 1 steps. Combined with this first step, we have that this process
-- finishes after k = (k + 1) - 1 steps, completing the induction.
--
-- Definition: The modified Fibonacci sequence F'(k) is defined recursively as
--
-- F'(0) = 0
-- F'(1) = 1
-- F'(2) = 2
-- F'(n + 2) = F'(n) + F'(n + 1)
--
-- Note that all values in this sequence are unique.
--
-- Theorem: Consider any input (a, b) to Euclid's algorithm such that the
-- algorithm requires k steps to terminate. Then F'(k) <= b.
--
-- Proof: By induction on k. As base cases, if k = 0, then the algorithm must
-- terminate immediately, so we must have that b = 0. Since F'(0) = 0 <= 0,
-- the claim holds. If the algorithm terminates after one step, then the
-- smallest b for which this is possible is 1, giving
--
-- gcd(a, 1) = gcd(1, 0) = 1
--
-- taking one step, and F'(1) = 1 <= 1. Finally, if k = 2, the algorithm must
-- terminate in two steps, and the smallest b for which this is possible can't
-- be zero or one (otherwise the above cases would cause it to terminate in 0
-- and 1 steps, respectively). However, it is possible for b = 2 to terminate
-- in two steps:
--
-- gcd(3, 2) = gcd(2, 1) = gcd(1, 0) = 1
--
-- does indeed terminate in two recursive calls, and F'(2) = 2 <= 2 as needed.
--
-- Now, for the inductive step, assume that for some k > 2, for all k' < k, the
-- claim holds and consider any (a, b) pair for which the algorithm takes k
-- steps to terminate. This means that (b, a mod b) must take k - 1 steps to
-- terminate, and so we have that a mod b >= F'(k - 1) by the inductive
-- hypothesis. Now suppose for the sake of contradiction that b < F'(k). This
-- means that b < F'(k - 2) + F'(k - 1), and so we can say that for some
-- 0 < c < F'(k - 2) that b = F'(k - 1) + c. Now consider the next two
-- recursive calls we would make to the algorithm. In the first, we would
-- compute gcd(b, a mod b) = gcd(F'(k - 1) + c, a mod b), and the second would
-- have the form gcd(a mod b, (F'(k - 1) + c) mod (a mod b)). Since a mod b
-- is at least F'(k - 1), (F'(k - 1) + c) mod (a mod b) is no larger than c,
-- which is less than F'(k - 2). But by the inductive hypothesis we must have
-- that F'(k - 2) <= c, since at this point the algorithm would take k - 2
-- steps to complete. However, we know that c < F'(k - 2), so this is
-- impossible. We have reached a contradiction, so our initial assumption was
-- wrong and therefore F'(k) <= b as required.
--
-- Notice that F'(k) = F(k - 1) for k > 0. This means that given input (a, b)
-- to Euclid's algorithm with b != 0, the algorithm can't take more than k
-- steps if F(k - 1) is the largest Fibonacci number less than b. Consequently
-- (since Fibonacci numbers grow exponentially fast), Euclid's algorithm must
-- take at most O(lg b) steps to complete.
euclidGCD a b = computeGCD (if a < 0 then -a else a) (if b < 0 then -b else b)
where computeGCD a 0 = a
computeGCD a b = computeGCD b (a `mod` b)
-- Euclid's algorithm can also be used to compute a continued fraction
-- representation of a rational number a / b. A continued fraction is a value
-- for the form
--
-- n
--
-- for some integer n, or a value of the form
--
-- 1
-- n + -------
-- E
--
-- Where E is itself a continued fraction. As an example, 4 / 7 can be written
-- as
--
-- 1
-- 0 + -----------
-- 1
-- 1 + -------
-- 1
-- 1 + ---
-- 3
--
-- There is a beautiful recursive formulation for where the constants in this
-- sequence come from. It's defined by the following algorithm:
--
-- 1. Write a = qb + r. If r = 0, output q.
-- 2. Otherwise, output q + 1/E, where E is a continued fraction for b / r.
--
-- The connection to Euclid's algorithm here is noticable. In the standard
-- formulation of Euclid's algorithm, we compute gcd(a, b) = gcd(b, a mod b).
-- This terminates when the second term becomes zero. When computing continued
-- fractions, we compute continued fractions for a/b, b/a mod b, etc. until the
-- numbers are multiples of one another, at each point outputting the quotient
-- of the two numbers.
--
-- The proof of correctness of this algorithm is by induction on their
-- denominator b. As a base case, if b = 1, then for any fraction a / b we
-- write a = a*1 + 0, outputting that a / b = a, which is correct. For the
-- inductive step, assume that for any b > 1 that for all b' < b, the above
-- algorithm is correct. Then consider any fraction of the form a / b. If we
-- have that a = qb for some q, then we output q = a/b and are done. Otherwise
-- we write a = qb + r and output q + 1/E, where E is a continued fraction for
-- b/r. This expression evaluates to q + 1/(b/r) = q + r/b = qb/b + r/b =
-- (qb + r) / b = a / b as expected.
--
-- Using a similar proof to that for Euclid's algorithm, this means that given
-- a fraction a/b, we can compute a continued fraction for a/b in O(lg b) time.
-- The code below produces this continued fraction as a series of the
-- coefficients in that fraction.
continuedFraction a b = if a `mod` b == 0
then [a `div` b]
else (a `div` b) : continuedFraction b (a `mod` b)
-- Our last application of Euclid's algorithm is the extended Euclidean
-- algorithm, which given integers a and b produces coefficients x and y such
-- that
--
-- ax + by = gcd(a, b)
--
-- The idea behind the algorithm is as follows. When computing gcd(a, b), we
-- use the division algorithm to write out
--
-- a = qb + r
--
-- We then find coefficients x and y such that
--
-- ax + by = r
--
-- Notice that as the Euclidean algorithm runs and a and b tend toward zero, we
-- will eventually find that b is zero. When this happens, a will be the gcd
-- of the two numbers and we will have an (albeit simple) expression of the
-- form ax + by = gcd(a, b). If we can somehow use this information to recover
-- the coefficients x and y for the original values of a and b, then we have an
-- algorithm for the problem as a whole.
--
-- To do this, we need to see if we can find some nice relationship between the
-- various values of r that appear throughout the sequence. Let's consider the
-- following sequence:
--
-- r_0 = a
-- r_1 = b
-- r_2 = a mod b
-- r_3 = b mod (a mod b)
-- ...
-- r_{n+2} = r_n mod r_{n+1}
--
-- It's difficult to work with this recurrence in terms of mods, so let's see
-- if we can make it a bit easier to manipulate by using the division algorithm
-- to give us an explicit formula for these mods. In particular, if we use the
-- division algorithm to write out
--
-- r_n = q_n r_{n+1} + r_{n+2}
--
-- Then we get that
--
-- r_{n+2} = r_n mod r_{n+1}
--
-- so
--
-- r_{n+2} = r_n - q_n r_{n+1}
--
-- How can we use this recurrence? Well, suppose that we know constants x0,
-- x1, y0, and y1 such that we have
--
-- ax0 + by0 = r_i
-- ax1 + by1 = r_{i+1}
--
-- Can we somehow use this to find an x2, y2 pair such that this holds: (?)
--
-- ax2 + by2 = r_{i+2}
--
-- Well, let's think about what this would need to look like. Using the above
-- recursive definition for r, we're looking for an x2, y2 such that
--
-- ax2 + by2 = r_i - q_n r_{i+1}
--
-- We know the values of r_{i+1} and r_i in terms of a, b, x0, x1, y0, and y1
-- from above. If we substitute, we get that
--
-- ax2 + by2 = (ax0 + by0) - q_n (ax1 + by1)
--
-- or that
--
-- ax2 + by2 = a(x0 - q_n x1) + b(y0 - q_n y1)
--
-- In other words, as long as we know the quotients of the terms generated by
-- the Euclidean algorithm, we can write out a nice recurrence relating the new
-- coefficients to the old:
--
-- x_{n+2} = x_n - q_n x_{n+1}
-- y_{n+2} = y_n - q_n y_{n+1}
--
-- We can keep repeating this process iteratively until we get that some
-- r_{i+1} = 0, and then return x_i and y_i as the coefficients in question.
--
-- Of course, this means that we need to find initial values of x_0, y_0, x_1,
-- y_1 such that
--
-- ax0 + by0 = a
-- ax1 + by1 = b
--
-- From which we easily see that x0 = y1 = 1, x1 = y0 = 0.
extendedEuclid a b = recurse a b 1 0 0 1
where recurse a 0 x0 y0 x1 y1 = (x0, y0)
recurse a b x0 y0 x1 y1 = let q = a `div` b in
let r = a `mod` b in
recurse b r x1 y1 (x0 - q * x1) (y0 - q * y1)