[ENH] Adding up a secondary cost during sssp calculation #1373

iceboy910447 · 2021-02-02T01:06:44Z

Describe the solution you'd like
Adding up a secondary weight, that doesn't need to be optimal but itself would be an information of interest during evaluation of the shortest path

Describe alternatives you've considered
The predecessor column of the result of sssp allows to trace back the shortest path to the source. If a secondary edgeweight is provided, it is possible to use a numba function to add up that secondary edgeweight for all paths while tracing back the path to the source. No matter the optimization this 'trace back methode' will allways be less efficent in comparison to a methode, that adds up the secondary cost during sssp

Additional context
If a graph of a streetnetwork is used, a sssp call can provide a shortest path calculation for weight = distance or weigth = travel time. If you would like to compare these, one would like to know the travel time for the shortest path, and the driving distance for the fastest path. Since tracing back each route and adding up a secondary weight of each edge, it would be more efficient to add up the secondary cost during the shortest path calculation.

A feature like that could also be useful to determine the number of hops of the fastest path in a computer network or to determine travel fees of a shortest route, if that route includes toll roads.

The text was updated successfully, but these errors were encountered:

hlinsen · 2021-02-02T22:49:43Z

Hey @iceboy910447, this could easily be added indeed but I don't see any performance issues. It looks to me that a cumsum from cudf would suffice. This will also scale better if there are multiple weight columns you want to investigate.

iceboy910447 · 2021-02-03T00:12:04Z

Hey @hlinsen, i tried it with a my own numba function. For smaller Graphs, the performance was acceptable, but for graphs with more than 100.000 nodes and 200.000 edges, the sssp() was several times faster. Deleting that numba function lowered the execution time from several hours per graph to less than 20 minutes. Therefor i think its a performance issue. Since the algorithm would need to trace back each unique path, find the corresponding secondary weight for each edge in that path and than add up all these weights, it seems to me like that reverse approach should be have a higher runtime complexity than a approach that sums up that weight during the single source shortest path calculation.

I will try to provide a small jupyter notebook in which i can show the time difference of the execution.

hlinsen · 2021-02-03T01:30:17Z

Have you considerd using cudf for this? It seems what you're trying to do is a cudf.merge to match (src, dst) to (vertex, predecessor), get the secondary weights and run cudf.cumsum.

iceboy910447 · 2021-02-03T02:04:38Z

@hlinsen you are correct that cudf.merge could be used to map the (prevertex,vertex) Pair to (src,dst) of each edge and thereby adding the info about the secondary weight for the last edge of the path to a given vertex. I think were you might be wrong is who the process of adding up those secondary weights should work. To my understanding cumsum will take a cudf.series and create a new series where each value v[i] is the sum of the values of the input series e[0] + e[1]+e[2]+ ... + e[i]. If a graph is linear, cudf.cumsum would return a correct result, but since a graphic representation of all shortest path between a given source and all other nodes as destinations would most likely be closer to a tree with source as the root of that tree, you would have to check first if a previous element in that merged dataframe actually is the predecessor of the current element. Since two different vertex ids or destinations can have the same predecessor it will be impossible to sort that merged dataframe in a way so that an entry is allways the predecessor of the next entry. Therefore to my understanding cumsum would not return a correct result

hlinsen · 2021-02-03T02:17:45Z

Indeed it won't work as is you would have to create the edge path first before using cumsum. I was wondering if it could be faster than the numba function you're using now. The thing is that a cudf solution is convenient if there are multiple edge weight columns you want to sum.

github-actions · 2021-03-05T23:21:59Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

…output (#1376) Solves: #1373 Authors: - Hugo Linsenmaier (https://github.com/hlinsen) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Andrei Schaffer (https://github.com/aschaffer) - Alex Fender (https://github.com/afender) URL: #1376

afender · 2021-04-07T15:39:07Z

@iceboy910447 this feature was implemented by @hlinsen in #1454 and merged today in RAPIDS 0.19 release. Please let us know if you have any further feedback. Thanks again!

iceboy910447 · 2022-06-25T18:44:22Z

@afender @hlinsen Thank you very much for implementing this feature. This has helped me a great deal.

I have used this to replace my own numba-based implementation which was slower in comparison to your implementation. I'm currently using it on a follow-up project for my Kaggle Notebook on Route Calculation using GPU and public data

During this i couldn't help but notice, that for larger graphs the computation time for sssp in comparison to get_traversed_cost is much smaller, similar to what i mentioned in my comment [from 3 Feb 2021](#1373 (comment)). For this reason i was wondering if it would be possible to integrate the secondary cost calculation within the sssp call.

After trying to understand your implementation, my informed guess is that your current calculation reverse-engineers the path structure from the sssp result, and then calculates the secondary cost based on that path structure. For larger graphs with more than 30.000 nodes and 100.000 edges this reverse-engineering of the path structure seems to be a lot slower than generating the path during sssp, which is why I still hope that you might be able to integrate this functionality into sssp directly. So far i have tried to understand the algorithm and implementation in use, but my lack of experience with cuda has stopped my affords at this point.

I understand that the current implementation with cuda is quite complex and not as flexible as it would need to be to implement for an easy solution but maybe if you can help me understand how the sssp implementation works to calculate the first cost, maybe i can come up with a solution, that would help to solve this.

If it could be integrated this could easily reduce my computational cost by more than an order of magnitude, since I'm currently trying to implement a calculation for the trip duration of the shortest path in the network, based on the statistical number of rides for the time of day. So implementing this could not only help me to reduce the computation time for secondary cost calculation, but also reduce the number of calls and kernel generation necessary.

I hope you might be able to help or lead me in the right direction to solve this on my own.

Best Regards
iceboy910447

ChuckHastings · 2022-06-28T22:23:08Z

@iceboy910447 - could you please open a new issue to request this change? We generally don't track comments on closed issues like this, it would make it much easier - given our development process - to properly comment and address this request.

We have added some new features since this original issue... including updating the backend for SSSP. If you could note in the new issue where in your notebook the code is that you have that is working around this issue, or otherwise provide some sample code that is slower than you would like it would probably be easier to address your issue. I think I get what you're saying, but a more concise example would be helpful.

Thanks!

iceboy910447 added the ? - Needs Triage Need team to review and classify label Feb 2, 2021

hlinsen mentioned this issue Feb 3, 2021

Add utility function for computing a secondary cost for BFS and SSSP output #1376

Merged

BradReesWork assigned hlinsen Feb 3, 2021

BradReesWork added improvement Improvement / enhancement to an existing function and removed ? - Needs Triage Need team to review and classify labels Feb 3, 2021

BradReesWork added this to the 0.19 milestone Feb 3, 2021

BradReesWork linked a pull request Feb 12, 2021 that will close this issue

Add utility function for computing a secondary cost for BFS and SSSP output #1376

Merged

github-actions bot added the inactive-30d label Mar 5, 2021

BradReesWork mentioned this issue Mar 17, 2021

[REVIEW] update graph to accept multiple weights #1454

Closed

BradReesWork removed the inactive-30d label Mar 29, 2021

rapids-bot bot closed this as completed in #1376 Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Adding up a secondary cost during sssp calculation #1373

[ENH] Adding up a secondary cost during sssp calculation #1373

iceboy910447 commented Feb 2, 2021 •

edited

Loading

hlinsen commented Feb 2, 2021 •

edited

Loading

iceboy910447 commented Feb 3, 2021

hlinsen commented Feb 3, 2021

iceboy910447 commented Feb 3, 2021

hlinsen commented Feb 3, 2021

github-actions bot commented Mar 5, 2021

afender commented Apr 7, 2021

iceboy910447 commented Jun 25, 2022

ChuckHastings commented Jun 28, 2022

[ENH] Adding up a secondary cost during sssp calculation #1373

[ENH] Adding up a secondary cost during sssp calculation #1373

Comments

iceboy910447 commented Feb 2, 2021 • edited Loading

hlinsen commented Feb 2, 2021 • edited Loading

iceboy910447 commented Feb 3, 2021

hlinsen commented Feb 3, 2021

iceboy910447 commented Feb 3, 2021

hlinsen commented Feb 3, 2021

github-actions bot commented Mar 5, 2021

afender commented Apr 7, 2021

iceboy910447 commented Jun 25, 2022

ChuckHastings commented Jun 28, 2022

iceboy910447 commented Feb 2, 2021 •

edited

Loading

hlinsen commented Feb 2, 2021 •

edited

Loading