The Magic of Erasure Codes
$\xmapsto{\mathtt{ReconstructData}}$
Erasure codes, how do they work?
First, turn into a byte-level problem...
cashcat0.jpg: aa b3 3f 2f 13 33 ...
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: cc 35 d3 3f ff c0 ...
|
|
ComputeParity
|
\|/
cashcats.p00: 14 34 ...
cashcats.p01: 25 53 ...
…and so on.
Record hashes, and treat corrupted files as missing.
cashcat0.jpg: XX XX XX XX XX XX ... (sha1 mismatch)
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: ?? ?? ?? ?? ?? ?? ... (missing)
cashcats.p00: 14 34 11 50 3f fe ...
cashcats.p01: 25 53 23 36 1f 3d ...
|
|
ReconstructData
|
\|/
cashcat0.jpg: aa b3 ...
cashcat1.jpg: bb 34 ...
cashcat2.jpg: cc 35 ...
…and so on.
$$(p_0, p_1) = \mathtt{ComputeParity}(d_0, d_1, d_2)$$
$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} = P \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
$= \begin{pmatrix}
1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2
\end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
or
$$ \begin{aligned}
p_0 &= 1/3 \cdot d_0 + 1/2 \cdot d_1 + 1/1 \cdot d_2 \\
p_1 &= 1/4 \cdot d_0 + 1/3 \cdot d_1 + 1/2 \cdot d_2
\end{aligned}$$
⚠️ For now, rational numbers (fractions), not bytes. ⚠️
Generate parity matrix with parityCount=2
rows and
dataCount=3
columns.
$$P = \begin{pmatrix}
1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2
\end{pmatrix}$$
$$P[i, j] = \frac{1}{(\texttt{dataCount} + i) - j}$$
for i, j in range(0, 2) × range(0, 3)
Called a Cauchy matrix.
Reconstruction: Start with the parity computation…
$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} =
\begin{pmatrix} 1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
Append to identity function…
$\begin{pmatrix} d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1 \end{pmatrix} =
\begin{pmatrix} 1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2
\end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
Remove unknown rows…
$\begin{pmatrix} \xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1 \end{pmatrix} =
\begin{pmatrix} \xcancel{1} & \xcancel{0} & \xcancel{0} \\
0 & 1 & 0 \\
\xcancel{0} & \xcancel{0} & \xcancel{1} \\
1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2
\end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
Remove unknown rows…
$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} =
\begin{pmatrix} 0 & 1 & 0 \\
1/3 & 1/2 & 1/1 \\
1/4 & 1/3 & 1/2
\end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix} $
$= M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
$$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$$
Find $M^{-1}$ such that
$M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
- Always exists, by special properties of Cauchy matrices!
- Use row reduction (a.k.a. Gaussian elimination) to compute.
$$(d_0, d_1, d_2) = \mathtt{ReconstructData}(?, d_1, ?, p_0, p_1)$$
$$
\begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix} = M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix}
= \begin{pmatrix}
-1 & -6 & 12 \\
1 & 0 & 0 \\
-1/6 & 3 & -4
\end{pmatrix}
\cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix}$$
Remaining problem…
How to compute parity bytes instead of parity numbers?
- Matrix elements must obey a particular interface.
- They must belong to a field.
- We want a field with 256 elements.
What is a field?
interface Field<T> {
static Zero: T, One: T
plus(b: T): T // $a \oplus b$
negate(): T // $-a$
times(b: T): T // $a \otimes b$
reciprocate(): T // $a^{-1}$, for $a \ne 0$
equals(b: T): bool // $a = b$
}
// $a \ominus b$
minus(b: T): T => this.plus(b.negate())
// $a \oslash b$
dividedBy(b: T): T => this.times(b.reciprocate())
Field 📜interface contract📜
- Identities: $a \oplus 0 = a \otimes 1 = a$.
- Inverses: $a \oplus -a = 0$, and if $a \ne 0$,
👉$a \otimes a^{-1} = 1$👈.
- Associativity: $(a \oplus b) \oplus c = a \oplus (b \oplus c)$, $(a \otimes b) \otimes c = a \otimes (b \otimes c)$.
- Commutativity: $a \oplus b = b \oplus a$, $a \otimes b = b \otimes a$.
- Distributivity: $a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$.
- Rational numbers (i.e., fractions) form a field: $(p/q)^{-1} = q/p$.
- Integers don’t form a field, since only $1$ and $-1$
have reciprocals.
- Integers mod $p$ form a field with $p$ elements, only when $p$ is prime, e.g. field of 257 elements.
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 257$, there is exactly one $0 \lt b \lt 257$ such that $(a \cdot b) \bmod 257 = 1\text{.}$
(Can replace $257$ with any prime number $p$).
reciprocate() => {
if (this === 0) { throw new Error("Division by zero"); }
for (let i = 0; i < 257; i += 1) {
if ((this * i) % 257 === 1) { return i; }
}
throw new Error("Shouldn't get here!");
}
If speed is a problem, build a table.
- Field of 257 elements gets us close…
- Problem: 256 isn’t a prime number!
- Does this mean we cannot make a field of 256 elements?
No!
We can’t just do so based on standard integer arithmetic.
Binary carry-less arithmetic
Let $a = 23$ and $b = 54$.
Then, with binary carry-less addition,
a = 23 = 10111b
^ b = 54 = 110110b
-------
100001b
so $a \oplus b = 100001_{\mathrm{b}} = 33$.
Binary carry-less addition is just bitwise xor!
So is binary carry-less subtraction, since a ^ a = 0
.
Let $a = 23$ and $b = 54$.
Then, with carry-less arithmetic,
a = 23 = 10111b
^* b = 54 = 110110b
------------
10111
^ 10111
^ 10111
^ 10111
------------
1111100010b
so $a \otimes b = 1111100010_{\mathrm{b}} = 994$.
Let $a = 55$ and $b = 19$. Then, with carry-less arithmetic,
11b
--------
b = 19 = 10011b )110111b = 55 = a
^ 10011
-----
10001
^ 10011
-----
10b
so $a \oslash b = 11_{\mathrm{b}} = 3$
with remainder $10_{\mathrm{b}} = 3$.
Important subtlety: we don’t
stop when the remainder is less than the divisor, but when
it has fewer digits than the divisor.
Properties of (carry-less) mod
- mod by $256 \le n \lt 512$ $\Rightarrow$ $n$ possible remainders.
- clmod by $256 \le n \lt 512$ $\Rightarrow$ $256$ possible remainders.
- Very interesting…clmod by prime number $256 \le p \lt 512$?
What are prime numbers?
- An integer $p \gt 1$ that cannot be expressed as $p = a \cdot b$, for $a, b \gt 1$.
- Multiplication operation determines the prime numbers.
- Want a “carry-less” prime number between 256 and 512!
- 283 is such a number. (Coincidentally, it’s also a regular prime.)
Field of 256 elements
…is just binary carry-less arithmetic mod 283.
class Field256Element implements Field<Field256Element> {
static Zero = 0, One = 1
plus(b) => (a ^ b)
negate() => a
times(b): => clmod(clmul(a, b), 283)
reciprocate() => // next slide
equals(b) => (a === b)
}
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 256$, there is a $0 \lt b \lt 256$ such that $(a \otimes b) \mathbin{\mathrm{clmod}} 283 = 1$.
reciprocate() => {
if (this === 0) { throw new Error("Division by zero"); }
for (let i = 0; i < 256; i += 1) {
if (clmod(clmul(this, i), 283) === 1) { return i; }
}
throw new Error("Shouldn't get here!");
}
The full algorithm
- Do everything with
Field256Element
, i.e. bytes with carry-less arithmetic mod 283.
- In particular, compute $P$ like:
$$
\begin{aligned}
P &= \begin{pmatrix}
(3 \oplus 0)^{-1} & (3 \oplus 1)^{-1} & (3 \oplus 2)^{-1} \\
(4 \oplus 0)^{-1} & (4 \oplus 1)^{-1} & (4 \oplus 2)^{-1}
\end{pmatrix} \\
&= \begin{pmatrix}
3^{-1} & 2^{-1} & 1 \\
4^{-1} & 5^{-1} & 6^{-1}
\end{pmatrix}
= \begin{pmatrix}
246 & 141 & 1 \\
203 & 82 & 123
\end{pmatrix}
\end{aligned}
$$
- Then proceed as before.