This contains code to reproduce the numerical experiments for Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations by Yifan Chen, Ethan N. Epperly, Joel A. Tropp, and Robert J. Webber.
Randomly pivoted Cholesky (RPCholesky) computes a low-rank approximation to a positive semidefinite matrix
While the main purpose of these scripts is for scientific reproducibility, they also may be useful for using RPCholesky in an application.
RPCholesky is implemented in the rp_cholesky
method in rp_cholesky.py
, and can be called as
nystrom_approximation = rp_cholesky(A, num_pivots)
The input matrix A
should be an AbstractPSDMatrix
object, defined in matrix.py
.
A psd matrix stored as a numpy
array can be made usable by rp_cholesky
by wrapping it in a PSDMatrix
object:
nystrom_approximation = rp_cholesky(PSDMatrix(ordinary_numpy_array), num_pivots)
The output of rp_cholesky
is a PSDLowRank
object (defined in lra.py
).
From this object, one can obtain
nystrom_approximation = rp_cholesky(A, num_pivots)
F = nystrom_approximation.F # Nystrom approximation is F @ F.T
pivots = nystrom_approximation.idx
rows = nystrom.rows # rows = A[pivots, :]
The first step to reproducing the experiments from the manuscript is to run the script
./setup.sh
which sets up the file structure, loads RLS and DPP samplers, and downloads the QM9 dataset for the KRR example. The data from the figures in the paper can produced by running the following scripts, each of which has instructions for its individual use at a comment at the top:
comparison.py
: compares the approximation error for different Nyström methods. Used to produce the left displays in Figure 1.chosen.py
: outputs the pivots chosen by different Nyström methods. Used to produce the right displays in Figure 1.entries.py
: outputs the entry evaluations for different Nyström methods. Used to produce Figure 2.qm9_krr.py
: performs kernel ridge regression on the QM9 dataset. Used to produce Figure 3.cluster_biomolecule.py
: performs spectral clustering on the alanine dipeptide dataset. Used to produce Figure 4.timing.py
: compares the timing of different Nyström methods.
Once the relevant Python scripts have been run, the figures from the paper can be generated from the relevant MATLAB scripts in matlab_plotting
.
Figure 4 in the manuscript was completely changed in revision. Figure 4 from earlier versions of the manuscript can be generated using the scripts cluster_letters.py
and cluster_letters_plot.py
.