Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adding up a secondary cost during sssp calculation #1373

Closed
iceboy910447 opened this issue Feb 2, 2021 · 9 comments · Fixed by #1376
Closed

[ENH] Adding up a secondary cost during sssp calculation #1373

iceboy910447 opened this issue Feb 2, 2021 · 9 comments · Fixed by #1376
Assignees
Labels
improvement Improvement / enhancement to an existing function
Milestone

Comments

@iceboy910447
Copy link

iceboy910447 commented Feb 2, 2021

Describe the solution you'd like
Adding up a secondary weight, that doesn't need to be optimal but itself would be an information of interest during evaluation of the shortest path

Describe alternatives you've considered
The predecessor column of the result of sssp allows to trace back the shortest path to the source. If a secondary edgeweight is provided, it is possible to use a numba function to add up that secondary edgeweight for all paths while tracing back the path to the source. No matter the optimization this 'trace back methode' will allways be less efficent in comparison to a methode, that adds up the secondary cost during sssp

Additional context
If a graph of a streetnetwork is used, a sssp call can provide a shortest path calculation for weight = distance or weigth = travel time. If you would like to compare these, one would like to know the travel time for the shortest path, and the driving distance for the fastest path. Since tracing back each route and adding up a secondary weight of each edge, it would be more efficient to add up the secondary cost during the shortest path calculation.

A feature like that could also be useful to determine the number of hops of the fastest path in a computer network or to determine travel fees of a shortest route, if that route includes toll roads.

@iceboy910447 iceboy910447 added the ? - Needs Triage Need team to review and classify label Feb 2, 2021
@hlinsen
Copy link
Contributor

hlinsen commented Feb 2, 2021

Hey @iceboy910447, this could easily be added indeed but I don't see any performance issues. It looks to me that a cumsum from cudf would suffice. This will also scale better if there are multiple weight columns you want to investigate.

@iceboy910447
Copy link
Author

Hey @hlinsen, i tried it with a my own numba function. For smaller Graphs, the performance was acceptable, but for graphs with more than 100.000 nodes and 200.000 edges, the sssp() was several times faster. Deleting that numba function lowered the execution time from several hours per graph to less than 20 minutes. Therefor i think its a performance issue. Since the algorithm would need to trace back each unique path, find the corresponding secondary weight for each edge in that path and than add up all these weights, it seems to me like that reverse approach should be have a higher runtime complexity than a approach that sums up that weight during the single source shortest path calculation.

I will try to provide a small jupyter notebook in which i can show the time difference of the execution.

@hlinsen
Copy link
Contributor

hlinsen commented Feb 3, 2021

Have you considerd using cudf for this? It seems what you're trying to do is a cudf.merge to match (src, dst) to (vertex, predecessor), get the secondary weights and run cudf.cumsum.

@iceboy910447
Copy link
Author

@hlinsen you are correct that cudf.merge could be used to map the (prevertex,vertex) Pair to (src,dst) of each edge and thereby adding the info about the secondary weight for the last edge of the path to a given vertex. I think were you might be wrong is who the process of adding up those secondary weights should work. To my understanding cumsum will take a cudf.series and create a new series where each value v[i] is the sum of the values of the input series e[0] + e[1]+e[2]+ ... + e[i]. If a graph is linear, cudf.cumsum would return a correct result, but since a graphic representation of all shortest path between a given source and all other nodes as destinations would most likely be closer to a tree with source as the root of that tree, you would have to check first if a previous element in that merged dataframe actually is the predecessor of the current element. Since two different vertex ids or destinations can have the same predecessor it will be impossible to sort that merged dataframe in a way so that an entry is allways the predecessor of the next entry. Therefore to my understanding cumsum would not return a correct result

@hlinsen
Copy link
Contributor

hlinsen commented Feb 3, 2021

Indeed it won't work as is you would have to create the edge path first before using cumsum. I was wondering if it could be faster than the numba function you're using now. The thing is that a cudf solution is convenient if there are multiple edge weight columns you want to sum.

@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function and removed ? - Needs Triage Need team to review and classify labels Feb 3, 2021
@BradReesWork BradReesWork added this to the 0.19 milestone Feb 3, 2021
@github-actions
Copy link

github-actions bot commented Mar 5, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@afender
Copy link
Member

afender commented Apr 7, 2021

@iceboy910447 this feature was implemented by @hlinsen in #1454 and merged today in RAPIDS 0.19 release. Please let us know if you have any further feedback. Thanks again!

@iceboy910447
Copy link
Author

@afender @hlinsen Thank you very much for implementing this feature. This has helped me a great deal.

I have used this to replace my own numba-based implementation which was slower in comparison to your implementation. I'm currently using it on a follow-up project for my Kaggle Notebook on Route Calculation using GPU and public data

During this i couldn't help but notice, that for larger graphs the computation time for sssp in comparison to get_traversed_cost is much smaller, similar to what i mentioned in my comment [from 3 Feb 2021](#1373 (comment)). For this reason i was wondering if it would be possible to integrate the secondary cost calculation within the sssp call.

After trying to understand your implementation, my informed guess is that your current calculation reverse-engineers the path structure from the sssp result, and then calculates the secondary cost based on that path structure. For larger graphs with more than 30.000 nodes and 100.000 edges this reverse-engineering of the path structure seems to be a lot slower than generating the path during sssp, which is why I still hope that you might be able to integrate this functionality into sssp directly. So far i have tried to understand the algorithm and implementation in use, but my lack of experience with cuda has stopped my affords at this point.

I understand that the current implementation with cuda is quite complex and not as flexible as it would need to be to implement for an easy solution but maybe if you can help me understand how the sssp implementation works to calculate the first cost, maybe i can come up with a solution, that would help to solve this.

If it could be integrated this could easily reduce my computational cost by more than an order of magnitude, since I'm currently trying to implement a calculation for the trip duration of the shortest path in the network, based on the statistical number of rides for the time of day. So implementing this could not only help me to reduce the computation time for secondary cost calculation, but also reduce the number of calls and kernel generation necessary.

I hope you might be able to help or lead me in the right direction to solve this on my own.

Best Regards
iceboy910447

@ChuckHastings
Copy link
Collaborator

@iceboy910447 - could you please open a new issue to request this change? We generally don't track comments on closed issues like this, it would make it much easier - given our development process - to properly comment and address this request.

We have added some new features since this original issue... including updating the backend for SSSP. If you could note in the new issue where in your notebook the code is that you have that is working around this issue, or otherwise provide some sample code that is slower than you would like it would probably be easier to address your issue. I think I get what you're saying, but a more concise example would be helpful.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants