-
Notifications
You must be signed in to change notification settings - Fork 183
A proposal of an all_simple_paths function implementation #1540
base: master
Are you sure you want to change the base?
Conversation
Add a function that finds all simple paths between two nodes in a graph.
Codecov Report
@@ Coverage Diff @@
## master #1540 +/- ##
=======================================
Coverage 99.44% 99.44%
=======================================
Files 106 107 +1
Lines 5551 5604 +53
=======================================
+ Hits 5520 5573 +53
Misses 31 31 |
I'd be interested in this feature. What is the state here? Are there plans to merge it? Or should this go into a standalone package? |
@lassepe Thank you for your comment. I’m hoping that this PR will be merged. @sbromberger If there are any problems or missing information for you or members to start your review of this PR, please feel free to point them out. |
src/traversals/allsimplepaths.jl
Outdated
g::AbstractGraph | ||
source::T # Starting node | ||
targets::Set{T} # Target nodes | ||
cutoff::Union{Int,Nothing} # Max length of resulting paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a union, keep this as T with the default as typmax(T)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I hadn't thought of that fix.
src/traversals/allsimplepaths.jl
Outdated
[1, 2, 3, 4] | ||
``` | ||
""" | ||
function all_simple_paths(g::AbstractGraph, source::T, targets::Vector{T}; cutoff::Union{Int,Nothing}=nothing) where T <: Integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targets
is a Vector
here, but in the struct, it's a Set
. Perhaps we should not type it here, and construct a Set
out of it in these functions instead of in the inner constructor of the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the typecasting from Structure to all_simple_paths
function. If I misunderstood your point let me know.
src/traversals/allsimplepaths.jl
Outdated
SimplePathIterator's state. | ||
""" | ||
mutable struct SimplePathIteratorState{T <: Integer} | ||
stack::Stack{Vector{T}} # Store child nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the struct O(|V|^2)
in memory, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the graph is dense and the path is long, the stack can grow O(|V|^2)
.
src/traversals/allsimplepaths.jl
Outdated
A helper function that updates iterator state. | ||
For internal use only. | ||
""" | ||
function stepback!(state::SimplePathIteratorState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's for internal use, perhaps prefix it with an underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the function name: _stepback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think we need to scrutinize the addition of these functions in LightGraphs base. The function will fail on large graphs of the type that LightGraphs is generally used to analyze and therefore functions like these have limited utility.
Our precedent has been to encourage the development of "expensive" algorithms outside the core LightGraphs code.
I'd like to get @jpfairbanks's and @simonschoelly 's opinions on this, though.
I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive. The docstring can note that the number of paths vastly exceeds the number edges and that they should be careful when using the function on even medium sized graphs. |
I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs). Do you have a specific recommendation with respect to including this code in base? |
I think including it with that complexity warning in the docstring is the right move. Since the algorithm can change without breaking the interface (when you iterate the iterator you get all the paths) then I think the memory use can get fixed if people use this and want to scale it up further. The set of paths is really big so I imagine that scaling won't be a priority unless someone's research needs it. I am in favor of inclusion with appropriate warnings. |
@sbromberger @jpfairbanks Many thanks for your comment and discussion! I have fixed the code.
It's just an idea at the moment, but I'm thinking that replacing the vectors in the stack with iterator state or something might improve the memory footprint. I will try this issue furthermore. |
To improve memory effeciency, make the stack store only parent node and index.
@sbromberger I rewrote the stack usage to improve memory efficiency. |
@sbromberger any news about merging this functionality into LightGraphs? It still could be very useful for small graphs. For example, recently I started to rewrite Dagitty R package for causal graph analisys to Julia and there are lots of dependencies here on all paths enumeration. It would be great to have this logic in LightGraphs instead of reimplementing the wheel in other packages. |
This PR is an implementation proposal for a function that searches all simple paths.
See also related issues in #1521
I'm not familiar with the coding manner of this project, so any feedback or suggestion is welcome.
I would appreciate it if you could check it out.
Overview
In this implementation, I followed the same strategy implemented in NetworkX (Python's Network Analysis package).
Here, I provide
all_simple_paths()
function that is intended to be a public interface.This function returns an iterator that generates simple paths (that means no repeated nodes).
By the word of Iterator, I mean objects that comply with the iteration protocol discussed in this document https://docs.julialang.org/en/v1/manual/interfaces/ .
In this PR I provide the following iteration methods:
Required methods
Some important optional methods
Note that the main logic of path search exists in the
iterate()
function.The
all_simple_paths()
is intended to be the only entry point. Other struct and functions are intended to be used implicitly or internally.Example
This is a simple example.
Test cases
I picked up related test cases from NetowrkX's and rewrote them to fit the LightGraphs' usage.
Performance
I compared this implementation and the original NetworkX's implementation through
PyCall
package.I observed approximately 10x faster results.
It seems not so bad, but when I apply the same NetworkX method in a python environment directory, I got 3-4x faster results than this implementation.
I have little experience with Julia performance tuning and have no idea the origin of the difference.
So, I am happy to hear any advice or idea to improve performance.
The followings are actual codes used for performance checks.
I got the following result.
The followings are python codes that can be run on IPython.
The result is followings.