-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated calls to nearest with same y #116
Comments
For CPU, we are currently using |
Also pinging @mrjel here who promised me some time ago to look into this ;) |
I eventually found this project: https://github.com/lxxue/FRNN The project is in a good state in that I was able to pull it straight off the shelf and use it to accelerate my training. I haven't spoken to the author, but I don't think the project is in a state where it will be well maintained. I don't have enough experience with open source software or the pytorch geometric project to judge, but the obvious question from my perspective is: Should torch_cluster have a |
Thanks for the pointer. Pinging @mrjel here as well as he always shared interest in adding this feature to |
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved? |
In my work I find myself making frequent cuda calls to
torch_cluster.nearest
with the formnearest(different_every_time, same_every_time)
without providing abatch_x
orbatch_y
.different_every_time
is on the order of say(40000, 3)
andsame_every_time
is(2000, 3)
. If this could be accelerated an order of magnitude, that would have significant value to me.Any suggestions?
Does anyone else find themselves in a similar situation?
Do they have a solution?
Would a solution have significant value to the community?
I assume that the strategy would be to pre-compute some kind of tree data structure, and then provide that
nearest_with_tree(different_every_time, precomputed_tree_structure)
On CPU I guess this would be a kd-tree and it would be orders and orders of magnitude faster than computing the 40000 * 2000 pairwise distances. On CUDA, I think the current
torch_cluster.nearest
is computing all of those distances, but of course the parallelisation and memory access patterns on CUDA might change the game in terms of gains with a tree structure?The text was updated successfully, but these errors were encountered: