-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Benchmarks #1
Comments
In terms of benchmarks, i don't know what i should think about random-instances. For many problems this is completely irrelevant (and might bumb low-level performance in contrast to algorithmics). I would focus on finding real-world problems to test against! Additionally: try to beat
I would not be surprised to see that those are faster (despite theoretically slower algorithms). |
Yes you might be right that random instances are rather made up, but at least the dense case is very relevant to me. In motmetrics we are globally matching detections made by a predictor to ground truth detections. Usually humans are visible in 80-90 percent of the frames, so one could nearly match all predictions made be the tracker in test. This leads to a nearly dense matrix (albeit with some cost structure...). I'll persist such a matrix and let you know. Thanks for the links, will give them a try! |
Random cost matrices are definitely not representative for many applications. But I wanted to toss in another toy problem that generates a more difficult cost matrix: https://gist.github.com/kylemcdonald/ac9e2e84552e60543071c33a250111c9 import numpy as np
import lap
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
side = 32
n = side * side
lhs = np.random.uniform(low=-1, high=+1, size=(n, 2))
lhs = np.cumsum(lhs, axis=0)
lhs -= lhs.min(axis=0)
lhs /= lhs.max(axis=0)
xv, yv = np.meshgrid(np.linspace(0, 1, side), np.linspace(0, 1, side))
rhs = np.dstack((xv, yv)).reshape(-1, 2)
cost = cdist(rhs, lhs, 'sqeuclidean')
%time min_cost, row_assigns, col_assigns = lap.lapjv(cost)
rhs_lhs = rhs[col_assigns]
plt.figure(figsize=(8, 8), facecolor='white')
plt.scatter(lhs[:,0], lhs[:,1])
for start, end in zip(lhs, rhs_lhs):
dx, dy = end - start
plt.arrow(start[0], start[1], dx, dy,
head_length=0.01, head_width=0.01, lw=0.5, alpha=0.5)
plt.xlim(-0.1, 1.1)
plt.ylim(-0.1, 1.1)
plt.show() I use this technique a lot for making grids from point clouds. This kind of euclidean bipartite matching is similar to motmetrics in theory, but in practice I would expect there are way more long-distance matches in this toy problem. In my experience, random cost matrices can be better served by sparse solvers when n is very large (e.g. the 5000x5000 case). |
Thanks for the suggestion @kylemcdonald. This looks like a promising benchmark to add. Hopefully I find some time in the upcoming holidays to add. To understand this fully let me summarize what you are doing: you generate two sets of points. One is regularily spaced the other is more or less randomly positioned. The cost is the squared euclidean distances between instances of points of the two sets. The |
@cheind that's exactly it. I use it for this sort of thing. A friend has a heuristic for this that provides poor solutions for non-uniform data. For comparison, on my machine, solving a 10,000x10,000 LAP (i.e., matching a cloud of 10,000 points to a grid of 100x100 points):
Solving a 4096x4096 LAP (matching a cloud of 4096 points to a grid of 64x64 points):
If you match two point sets that are more uniform you get closer to a random cost matrix: lhs = np.random.uniform(low=-1, high=+1, size=(n, 2))
rhs = np.random.uniform(low=-1, high=+1, size=(n, 2)) For 4096 points I see:
I should also note, I can only get this performance out of pylapsolver and lapjv if I convert the cost to from lap import lapjv
from lapsolver import solve_dense
from scipy.spatial.distance import cdist
dist = cdist(lhs, rhs, 'euclidean')
dtype = np.uint32
dtype_max = np.iinfo(dtype).max
dist_max = dist.max()
dist = (dist * dtype_max) / dist_max
dist = dist.astype(dtype)
%time cost_jv, rows_jv, cols_jv = lapjv(dist)
%time cols_ls, rows_ls = solve_dense(dist)
cost_jv = (cost_jv * dist_max) / dtype_max |
@kylemcdonald thanks for the feedback. I'm a bit disappointed by the performance of lapsolver in this respect. Did you run lapsolver also with int32 version, I does have some type overloading internally, but I suspect that it helps anyways. with or-tools I always had issues as creating the arcs took overwhelming, especially when having dense problems. I'm suprised by its performance, I have to admit. I wanted to convert the backend of lapsolver to use Eigen C++ template library for SIMD support and co, but I did not yet find the time to do so. |
I just changed to uint32 for pylapsolver and I saw a huge speed increase for the random cost matrix. For the cloud-to-grid problem I saw it drop from 40s to 33s. I am familiar with the or-tools issue, seeing your comment on that repo is how I found your work here :) I've heard that Cython can generate code that will be optimized to SIMD instructions by the compiler, but haven't confirmed this myself yet. If it's true, one other approach could be to try rewriting lapsolver in Cython. If you're interested in lapcsa here is my initial work https://gist.github.com/kylemcdonald/b92377c98e427eabf3726a1e121b05d8 The original code is designed to only use |
@kylemcdonald hope you are doing well! I recently stumbled upon IsoMatch, an algorithm to arrange a collection of objects to a target layout while trying to preserve distances between objects in the collection. The algorithm reminded me very much of what suggested. If I am not mistaken, step 3 in their paper matches the grid approach you mentioned above. Only now (after skimming the paper) I see why computing such an assigment is useful :) |
More libraries to test against: pymatgen, centrosome
See gatagat/lap#9
The text was updated successfully, but these errors were encountered: