I have once worked on distributed computation of matrix-matrix multiplication.
- [PlutoCharon20] H.-P. Wang, I. Duursma. Parity-Checked Strassen Algorithm. arXiv.
[PlutoCharon20] deals distributed matrix multiplication (DMM), where the workers might straggle or
crash, by combining ideas from fast matrix multiplication (FMM). By MM we mean the computation of
Straggling and crashing is a real issue in real world because, spontaneously, the network may be
busy, the CPU may be overheat, or the circuit board may be hit by cosmic radiation and cannot
recover from it. This makes the overall computation slow because we have to wait for the last
worker to tell us the product
To compensate, we can hire more workers and ask them to carry out redundant computations. A
possible way to create redundancy is to draw random row vector
The contribution of [PlutoCharon20] is three-fold.
-
One: We obverse that the routine computation of
$A\times B$ can be carried-out by fast matrix multiplication (FMM). This construction is named Pluto codes because the smallest working example uses nine workers and can afford breaking one, which reminds us that Pluto used to be the ninth planet. -
Two: Applying Pluto codes recursively, we obtain codes that behave like tensor product codes. Tensor produce codes have fast iterative decoders that is parallelism-friendly. This fits the current context of distributed computation.
-
Three: We observe that the computation of
$(gA) \times (Bh)$ , when$g$ and$h$ are matrices, can be carried-out by FMM as well. This is named Charon construction after the moon of Pluto. (Fun fact: Charon is the largest moon when it comes to relative size.)
The smallest working example of the Charon construction is when
Here is a figure I made to explain the tensor structure of Pluto.