import oineus as oin
import numpy as np
import torch
First, let us do everything by hand. If we want to create a filtration,
we need to create simplices first. We have a filtration of a triangle,
and we use double
to store filtration values (second argument to the constructor):
# vertices
v0 = oin.Simplex([0], 0.1)
v1 = oin.Simplex([1], 0.2)
v2 = oin.Simplex([2], 0.3)
# edges
e1 = oin.Simplex([0, 1], 1.2)
e2 = oin.Simplex([0, 2], 1.4)
e3 = oin.Simplex([1, 2], 2.1)
# triangle
t1 = oin.Simplex([0, 1, 2], 4.0)
We now put simplices into a list and create a parallel list
of values, so that simplex simplices[i]
enters filtration at time values[i]
.
simplices = [v0, v1, v2, e1, t1, e2, e3]
We put simplices of positive dimension in arbitrary order here. Vertices must always appear in the list first, and in the order prescribed by their index.
Now we create a filtration.
# constructor will sort simplices and assign sorted_ids
fil = oin.Filtration(simplices)
print(fil)
Note that:
- Each simplex has
id
, which equals its index in the listsimplices
. This is precisely why we insist that vertex[i]
appears in thei
-th position insimplices
: we want theid
of a vertex to match its index we use when we create positive-dimensional simplices:oin.Simplex([0, 2])
consists of verticessimplices[0]
andsimplices[2]
. - Each simplex has
sorted_id
, which equals its index in the filtration order. When we askfil
for simplices, they will appear in the order determined by thesorted_id
.
The constructor of a filtration has some additional arguments:
set_ids
isTrue
by default, that is whyid
s are overwritten. Set it toFalse
to preserve originalid
s. Caveat: vertexi
must still haveid == i
and theid
s must be unique. You can specify theid
as the first argument to constructor:oin.Simplex(3, [0, 1], 0.5)
or assign to it:sigma.id = 3
.sort_only_by_dimension
isFalse
by default. If you know that your simplices are already in the correct order, you can set it toTrue
: the simplices will be arranged by dimension, but the order of simplices of same dimension will be preserved.
You can create a VR filtration from a point cloud or from a distance matrix.
For point cloud, only dimensions 1, 2 and 3 are supported.
An input can be a NumPy array, a Jax array or a PyTorch tensor.
The shape is (#points, dimension)
, in other words, each point must be a row
in a matrix.
import numpy as np
import oineus as oin
# create 20 random points in space
np.random.seed(1)
n_points = 20
dim = 3
points = np.random.uniform(size=(n_points, dim))
fil = oin.vr_filtration(points, max_dim=3, max_diameter=2)
print(fil)
The parameters are:
points
: coordinates of points in the point cloud.max_dim
: the resulting filtration will contain simplices up to and includingmax_dim
. If you want to compute persistence diagrams in dimensiond
, you needmax_dim >= d+1
. Default value: dimension of the points.max_diameter
: only consider simplices up to this diameter. Default value: a minimax that guarantees contractibility after that value, as in Ripser.
For distance matrix you use the same function, just
specify the parameter from_pwdists
to be True.
In this case, max_dim
must be supplied.
import numpy as np
import scipy.spatial
import oineus as oin
# create 20 random points in space
np.random.seed(1)
n_points = 20
dim = 6
points = np.random.uniform(size=(n_points, dim))
distances = scipy.spatial.distance.cdist(points, points, 'euclidean')
fil = oin.vr_filtration(distances, from_pwdists=True, max_dim=3)
print(fil)
Lower-star filtrations are supported for functions on a regular D-dimensional grid for D = 1 , 2, or 3. Function values are represented as an D-dimensional NumPy array.
# create scalar function on 8x8x8 grid
f = np.random.uniform(size=(8, 8, 8))
fil = oin.freudenthal_filtration(data=f, max_dim=3)
If you want upper-star filtration, set negate
to True
:
fil = oin.freudenthal_filtration(data=f, negate=True)
If you want periodic boundary conditions (D-dimensional torus instead
of D-dimensional cube), set wrap
to True
:
fil = oin.freudenthal_filtration(data=f, wrap=True)
If your data is D-dimensional, but you only need lower-dimensional
simplices, you can specify max_dim
parameter (it defaults to data.ndim
):
fil = oin.freudenthal_filtration(data=f, max_dim=2)
Note that it is the dimension of maximal simplex in the filtration; if you want to compute persistence diagram in dimension k, you need to have simplices of dimension k+1 (negative ones).
Persistence diagram is computed from R=DV, RU=D
decomposition.
In fact, we only need the R
matrix to read off the persistence pairing,
but other matrices are needed in topological optimization.
The corresponding class
is called VRUDecomposition
. When we create it, we must specify whether we want homology
(dualize=False
) or cohomology (dualize=True
).
# no cohomology
dualize = False
# create VRU decomposition object, does not perform reduction yet
dcmp = oin.Decomposition(fil, dualize)
In order to perform reduction, we need to set parameters.
This is done through a single object of class ReductionParams
that encapsulates all
of these parameters.
rp = oin.ReductionParams()
Some of the parameters are:
rp.clearing_opt
whether you want to use clearing optimization.rp.n_threads
: number of threads to use, default is 1.rp.compute_v
: whether you want to compute theV
matrix.True
by default.rp.compute_u
: whether you want to compute theU
matrix.False
by default. This cannot be done in multi-threaded mode, so the reduction will return an error, ifn_threads > 1
and this option is set.
rp.compute_u = rp.compute_v = True
rp.n_threads = 16
# perform reduction
dcmp.reduce(rp)
# now we can acess V, R and U
# indices are sorted_ids of simplices == indices in fil.cells()
V = dcmp.v_data
print(f"Example of a V column: {V[-1]}, this chain contains cells:")
simplices = fil.simplices()
for sigma_idx in V[-1]:
print(simplices[sigma_idx])
Now we can ask for a diagram. The diagram
methods
uses the critical values from the filtration that was used to construct it to get
the values of simplices and returns diagrams in all dimensions. By default,
diagrams include points at infinity. If we only want the finite part,
we can specify that by include_inf_points
.
dgms = dcmp.diagram(fil, include_inf_points=False)
To get diagram in one specific dimension, we can subscript
the object or call the in_dimension
method.
Diagram will be returned as a NumPy array of shape (n, 2)
dim=2
dgm_2 = dcmp.diagram(fil).in_dimension(dim)
# or
dgm_2 = dcmp.diagram(fil)[dim]
assert type(dgm_2) is np.ndarray
Now, e.g. the birth coordinates are simply dgm_2[:, 0]
.
If we want to know the peristence pairing, that is, which
birth and death simplex gave us this particular point,
we can use index_diagram_in_dimension
.
dim=2
ind_dgm_2 = dcmp.diagram(fil).index_diagram_in_dimension(dim)
It is also a NymPy array (of integral type). We can get the zero persistence diagram:
z_dgm_2 = dcmp.zero_pers_diagram(fil).in_dimension(dim)
Finally, sometimes it can be more convenient to have not a NumPy
array, but as list
of diagram points that have birth/death values
and birth/death indices as members. For this, the in_dimension
method has the second argument, as_numpy
, which is True
by default,
but we can set it to False
:
dgm = dcmp.diagram(fil).in_dimension(dim, False)
for p in dgm:
print(p.birth, p.death, p.birth_index, p.death_index)
How to map this back to filtration? Let us take a look at a single point in the index diagram:
sigma_sorted_idx, tau_sorted_idx = ind_dgm_2[0, :]
sigma_sorted_idx
is the index of the birth simplex (triangle) in filtration order.
tau_sorted_idx
is the index of the death simplex (tetrahedron) in filtration order.
There are many ways to get the original simplex:
sigma = fil.simplex(sigma_sorted_idx)
will return a simplex itself. So,fil.simplex
takes thesorted_id
of a simplex and just accesses the vector of simplices at this index, so it is cheap.sigma_idx = fil.id_by_sorted_id(sigma_sorted_idx)
will return theid
ofsigma
. Recall that it is the index ofsigma
in the original list of simplices that was used to create the filtration. This is convenient, if you have a parallel array of some information, one entry per simplex, which you want to access.
There are two ways to perform topological optimization. We consider two examples, both use the following function to generate differentiable data.
def sample_data(num_points: int=50, noise_std_dev=0.1):
# sample points from the unit circle and add noise
# num_points: number of points to sample
# noise_std_dev: standard deviation of Gaussian noise
# return points as differentiable torch tensor
np.random.seed(1)
angles = np.random.uniform(low=0, high=2*np.pi, size=num_points)
x = np.cos(angles)
y = np.sin(angles)
x += np.random.normal(loc=0, scale=noise_std_dev, size=num_points)
y += np.random.normal(loc=0, scale=noise_std_dev, size=num_points)
pts = np.vstack((x, y)).T
pts = torch.Tensor(pts)
pts.requires_grad_(True)
return pts
pts = sample_data()
We also set up a PyTorch optimizer that acts on the points:
opt = torch.optim.SGD([pts], lr=0.1)
This method requires eagerpy
package. It provides wrappers around tensors from PyTorch, Jax and Tensorflow. Differentiable filtrations are defined in oineus.diff subpackage. To create a differentiable Vietoris-Rips, we can use the function oineus.diff.vr_filtration
with essentially the same signature as oineus.vr_filtration
.
import oineus as oin
import oineus.diff
fil = oin.diff.vr_filtration(pts)
Now fil
is an object that contains the standard filtration (accessible as fil.under_fil
) and a differentiable tensor of critical values fil.values
. We also need to create a TopologyOptimizer object:
top_opt = oin.diff.TopologyOptimizer(fil)
Suppose that we want to keep only the most persistent point in the 1-dimensional persistence diagram, and all other points should go to the diagonal vertically (that is, a point (b, d) ideally should go to point (b, b)).
dim = 1
n = 2
dgm = top_opt.compute_diagram(include_inf_points=False)
# eps is the threshold: all points whose persistence does not exceed eps will move
# if we want to remove all but one, the most persistent point, eps must be the
# second biggest persistence
eps = top_opt.get_nth_persistence(dim, n)
TopologyOptimizer provides a helper function simplify
which returns a tuple of indices and values. This is key concept: this tuple encodes prescribed targets for individual diagram points. The interpretation is: for every i
, simplex fil[indices[i]]
must take new critical value values[i]
. In this case, we want to modify only the death value, so every simplex in indices
will be a negative simplex that produces an off-diagonal point in the diagram.
indices, values = top_opt.simplify(eps, oin.DenoiseStrategy.BirthBirth, dim)
Now, if we want to use the traditional method of back-propagation through a persistence diagram, we simply need to define the topological loss and use standard PyTorch approach like that:
# zeros gradients on pts
opt.zero_grad()
top_loss = torch.mean((fil.values[indices] - values) ** 2)
# populates gradients on pts from top_loss
top_loss.backward()
# updates pts using gradient descent
opt.step()
However, if we want to operate through a bigger set of simplices at once, we can do the following (see 'Topological Optimization with Big Steps' for details):
# returns a list of critical sets for each singleton loss defined by indices[i], values[i]
critical_sets = top_opt.singletons(indices, values)
# resolve conflicts between critical sets
crit_indices, crit_values = top_opt.combine_loss(critical_sets, oin.ConflictStrategy.Max)
# convert lists to something torch can understand
crit_indices = np.array(crit_indices, dtype=np.int32)
crit_values = torch.Tensor(crit_values)
opt.zero_grad()
top_loss = torch.mean((fil.values[crit_indices] - crit_values) ** 2)
top_loss.backward()
opt.step()
This is a more complicated way, which gives you more control, you do all the steps by hand. The corresponding loss function is computed by this function:
def topological_loss(pts: torch.Tensor, dim: int=1, n: int=2):
pts_as_numpy = pts.clone().detach().numpy().astype(np.float64)
# we need critical edges for each simplex, so we specify
# with_critical_edges
fil, longest_edges = oin.vr_filtration(pts_as_numpy, with_critical_edges=True)
top_opt = oin.TopologyOptimizer(fil)
eps = top_opt.get_nth_persistence(dim, n)
indices, values = top_opt.simplify(eps, oin.DenoiseStrategy.BirthBirth, dim)
critical_sets = top_opt.singletons(indices, values)
crit_indices, crit_values = top_opt.combine_loss(critical_sets, oin.ConflictStrategy.Max)
crit_indices = np.array(crit_indices, dtype=np.int32)
crit_edges = longest_edges[crit_indices, :]
crit_edges_x, crit_edges_y = crit_edges[:, 0], crit_edges[:, 1]
crit_values = torch.Tensor(crit_values)
if len(crit_edges_x) > 0:
top_loss = torch.sum(torch.abs(torch.sum((pts[crit_edges_x, :] - pts[crit_edges_y, :])**2, axis=1) - crit_values ** 2))
else:
top_loss = torch.zeros(())
top_loss.requires_grad_(True)
return top_loss
First, we need to convert pts
to a NumPy array, because that is the type that oin.vr_filtration
expects as the first argument. Then we create the TopologyOptimizer
object that provides access to all optimization-related functions.
eps = top_opt.get_nth_persistence(dim, n)
Since we want to preserve all but the first n-1
points of the diagram, we compute the persistence of the n
-th point in the diagram. All points with persistence at most eps
will be driven to the diagonal. There are 3 natural choices: a point (b, d)
can be moved to (b, b)
, (d, d)
or ((b+d)/2, (b+d)/2)
. They correspond to 3 members of the enum oin.DenoiseStrategy
: BirthBirth
, DeathDeath
, Midway
.
indices, values = top_opt.simplify(eps, oin.DenoiseStrategy.BirthBirth, dim)
indices
and values
encode what we call the matching loss. It is easier to explain the meaning of this by example. Say, we want to move two points of the persistence diagram, (b_1, d_1)
and (b_2, d_2)
to the destinations (target_b_1, target_d_1)
and (target_b_2, target_d_2)
respectively. Recall that b_1
is the filtration value of some simplex in filtration, say, sigma_1
. Similarly, d_1
corresponds to sigma_2
, b_2
corresponds to sigma_3
and d_2
corresponds to sigma_4
. In this case, indices = [i_1, i_2, i_3, i_4]
and values = [target_b_1, target_d_1, target_b_2, target_d_2]
, where i_1
is the index (sorted_id
) of sigma_1
in the filtration, i_2
is the index of sigma_2
, etc.
Note that each pair, like (i_2, target_d_1)
defines
the singleton loss. The line
critical_sets = top_opt.singletons(indices, values)
computes all the critical sets at once. critical_sets
is a list
of the same length as indices
.
Each element of the list is a pair (value, simplex_indices)
,
where simplex_indices
contains the critical set
(indices of simplices in filtration order) and
value
is the target value that should be assigned to all of them.
There can be conflicts: different terms in the matching loss can send the same simplex to different values. In order to resolve them, we use the function combine_loss
:
crit_indices, crit_values = top_opt.combine_loss(critical_sets, oin.ConflictStrategy.Max)
The meaning of the output is the same as in simplify
: crit_indices
contains the indices of simplices and crit_values
contains the values that we want these simplices to have. The conflicts are resolved according to the oin.ConflictStrategy
enum: Max
means that we choose the value that is the farthest one from the current filtration value of the given simplex, Avg
means that we take the average.
In this example we use Vietoris--Rips. The simplex crit_indices[k]
has the longest edge, and crit_values[k]
is the length that we want this edge to have. It remains to express this in the differentiable (known to Torch) way.
First, for each critical simplex, let us extract the endpoints of its longest edge.
# convert from list of ints to np.array, so that subscription works
crit_indices = np.array(crit_indices, dtype=np.int32)
# we get only the edges of the critical simplices
crit_edges = longest_edges[crit_indices, :]
# split them into start and end points
# for critical simplex sigma that appears in position k in crit_indices,
# crit_edges_x[k] and crit_edges_y[k] give the indices (in pts)
# of the endpoints of its longest edge.
crit_edges_x, crit_edges_y = crit_edges[:, 0], crit_edges[:, 1]
top_loss = torch.sum(torch.abs(torch.sum((pts[crit_edges_x, :] - pts[crit_edges_y, :])**2, axis=1) - crit_values ** 2))
The expression torch.sum((pts[crit_edges_x, :] - pts[crit_edges_y, :])**2, axis=1)
performs summation over all coordinates, so the resulting tensor contains the squared lengths of critical edges computed in a differentiable way. Do not take the square root of these lenghts, this will lead to NaN-s in your gradients (see pytorch/pytorch#15506, but this is not Torch-specific, Jax has the same behavior). Instead, either use the squares of target lengths, as in this example, or add a small number before taking the square root.
Remarks:
- We used
simplify
just for convenience. In principle, the matching loss can be defined by the user manually, it is really just two arrays (lists): simplices and which filtration value we want them to have,indices
andvalues
. - Another convenience function that
TopologyOptimizer
provides ismatch
, it computes the loss that corresponds to Wasserstein matching to a template diagram. - If you do not want to use the critical set method,
but only backpropagate through the critical simplices,
just use
indices
andvalues
in place ofcrit_indices
,crit_values
when computing the edge endpoints and the squared lengths. This is what the paper callsdiagram method
. - The real code contains
if
statement: when there are no critical simplices the loss should be 0 (say,eps
was too small). However,torch
does not handle this corner case as we need, so we return a differentiable zero tensor manually. - If you need to work with high-dimensional data,
Oineus
will only accept the distance matrix to compute VR filtration and longest edges, not the points themselves. To be able to differentiate the distances, usetorch.cdist
or analogues to compute your distances in the differentiable way.
Oineus can compute the kernel, image and cokernel persistence diagrams as in "Persistent Homology for Kernels, Images, and Cokernels" by D. Cohen-Steiner, H. Edelsbrunner, D. Morozov. We first perform the required reductions using compute_kernel_image_cokernel_diagrams
, which has arguments:
K
the simplicial complex with function values, as a list with an element per simplex in the format[simplex_id, vertices, value]
, wherevertices
is a list containing the ids of the vertices, and value is the value under the function f.L
the simplicial sub-complex with function values, as a list with an element per simplex in the format[simplex_id, vertices, value]
, wherevertices
is a list containing the ids of the vertices, and value is the value under the function g.params
specifies which of the three components (image, kernel, cokernel) you need. Type: KICRParams.n_threads
the number of threads you want to use,return
an object which contains the kernel, image and cokernel diagrams, as well as the reduced matrices.
To obtain the different diagrams, we use kernel()
, image()
, cokernel()
, and then we can use in_dimension
to get the sepcific diagram in a specific dimension.
Note: aside from the number of threads, all other parameters are set already.
Suppose we have a simplicial complex compute_kernel_image_cokernel_diagrams
, and then access the 3 sets of diagrams using kernel()
, image()
, cokernel()
respectively. After which we can obtain a diagram in a specific dimension in_dimension(i)
.
import oineus as oin
n_threads = 4
K = [[0, [0], 10], [1,[1],50], [2,[2], 10], [3, [3], 10], [4,[0,1], 50], [5, [1,2], 50], [6,[0,3], 10], [7, [2,3], 10]]
L = [[0, [0], 10], [1,[1],50], [2,[2], 10], [3, [0,1], 50], [4,[1,2],50]]
ker_im_cok_dgms = oin.compute_kernel_image_cokernel_diagrams(K, L)
# by default, all 3 diagrams are computed
ker_dgms = ker_im_cok_dgms.kernel()
im_dgms = ker_im_cok_dgms.image()
cok_dgms = ker_im_cok_dgms.cokernel()
ker_dgms.in_dimension(0)
# if you only want, e.g., kernel:
params = oin.KICRParams()
params.image = params.cokernel = False
ker_im_cok_dgms = oin.compute_kernel_image_cokernel_diagrams(K, L, params)