lex_BFS method of Graph is not linear as written in the documentation #37642

cyrilbouvier · 2024-03-21T13:43:00Z

Steps To Reproduce

for i in range(100, 1000, 25):
     G = graphs.CompleteGraph(i) # any infinite family of graphs of increasing size should work
     s = timeit("G.lex_BFS()", seconds=True)
     m = G.size()
     n = G.order()
     print (i, s, s/(m+n))

Expected Behavior

A linear time algorithm as written in the documentation: https://doc.sagemath.org/html/en/reference/graphs/sage/graphs/generic_graph.html#sage.graphs.generic_graph.GenericGraph.lex_BFS

The time complexity of this algorithm is in O(n+m)

Actual Behavior

From what I understand (see below) the complexity is at least O(n+m*log(m))

Additional Information

From what I was able to understand, the problem comes from the lex_BFS function. In this function the input graph is converted to a short_digraph by calling init_short_digraph (see here). My claim is that the function init_short_digraph is not linear.
Note that the function lex_BFS_fast_short_digraph doing the actual computation is linear, the problem comes from before, when the graph is converted to the data structure expected by this function.

Looking at the code of init_short_digraph, I found two places where the list of edges is sorted, which imply a complexity of at least m*log(m) [ I only considered the case where the function is called with edge_labelled=False and vertex_list is provided, as in lex_BFS, otherwise more sorting can probably be found ]:

In line 275, G.edge_iterator(labels=False) is called with the default value for sort_vertices, which is True; this one is easy to fix, using G.edge_iterator(labels=False,sort_vertices=False) should be enough.
17 lines below the edges of each vertices are once again sorted; this one is more tricky to remove, because the functions has_edge and triangle_count assumed that the sort is done during initialization.
In the case of lex_BFS, these two functions are not used, so the sort could be remove.

But even with these two changes:

diff --git a/src/sage/graphs/base/static_sparse_graph.pyx b/src/sage/graphs/base/static_sparse_graph.pyx
index 35f3de67c9..36ed7d6a01 100644
--- a/src/sage/graphs/base/static_sparse_graph.pyx
+++ b/src/sage/graphs/base/static_sparse_graph.pyx
@@ -272,7 +272,7 @@ cdef int init_short_digraph(short_digraph g, G, edge_labelled=False, vertex_list
             g.neighbors[i] = g.neighbors[i - 1] + <int> len(G.edges_incident(vertices[i - 1]))
 
     if not edge_labelled:
-        for u, v in G.edge_iterator(labels=False):
+        for u, v in G.edge_iterator(labels=False,sort_vertices=False):
             i = v_to_id[u]
             j = v_to_id[v]
 
@@ -290,8 +290,8 @@ cdef int init_short_digraph(short_digraph g, G, edge_labelled=False, vertex_list
         g.neighbors[0] = g.edges
 
         # Sorting the neighbors
-        for i in range(g.n):
-            qsort(g.neighbors[i], g.neighbors[i + 1] - g.neighbors[i], sizeof(int), compare_uint32_p)
+        #for i in range(g.n):
+        #    qsort(g.neighbors[i], g.neighbors[i + 1] - g.neighbors[i], sizeof(int), compare_uint32_p)
 
     else:
         from operator import itemgetter

the function does not seem to be linear. I am not able to track where the non linear part is coming from (I suspect from the edge_iterator but I was not able to track it down).

An easy fix would be to add a sort parameter to init_short_digraph to disable all sorting inside this function and use it in lex_BFS. Does this sound like a good idea ? If yes, I could propose a PR.

Environment

- **OS**: Debian
- **Sage Version**: 10.3

Checklist

I have searched the existing issues for a bug report that matches the one I want to file, without success.
I have read the documentation and troubleshoot guide

The text was updated successfully, but these errors were encountered:

dcoudert · 2024-03-24T09:11:18Z

You are right, the time complexity of method init_short_digraph is in O(n * log(m) + m) as for each vertex the method sorts the neighbors. Unfortunately, it is not easy to change that. Indeed, some methods assume that the neighbors are sorted. This is for instance the case of method has_edge that performs a binary search in the list of neighbors.
If we stop sorting neighbors, the cost of method has_edge might increase from O(log(m)) to O(m).

I agree with the idea of adding a parameter sort_neighbors to init_short_digraph, with default value True. It must be properly documented and in particular its impact on the complexity of some methods. This parameter must be stored in the data structure in order to use the right method in has_edge (binary search or full scan of the list of neighbors).

Then we will have to check methods using short_digraph. When the order of the neighbors is not taken into account by the algorithm (e.g., no call to has_edge), then we can switch the new parameter to False.

dcoudert · 2024-03-24T13:35:32Z

I did proposed changes in #37662.

cyrilbouvier · 2024-04-02T12:21:21Z

Note that the same problem is also present for

lex_UP method (see source)
lex_DFS method (see source)
lex_DOWN (see source)
diameter method with algorithm='2sweep' (see source)

…t_digraph` Fixes sagemath#37642. We add parameter `sort_edges` to method `init_short_digraph` of the `short_digraph` data structure defined in `src/sage/graphs/base/static_sparse_graph.pxd|pyx`. - When set to `True` (default), for each vertex, the list of neighbors is sorted by increasing vertex labels. This enables to search for a vertex in this list using binary search, and so in time `O(log(n))`, but the overall running time of method `init_short_digraph` is in `O(m + n*log(m))`. - When set to `False`, neighbors are not sorted. The running time of method `init_short_digraph` is reduced to `O(n + m)`, but the running time for searching in the list of neighbors increases to `O(m)`. So this new parameter is particularly useful for linear time algorithms such as `lex_BFS`. ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [x] I have updated the documentation accordingly. ### ⌛ Dependencies    URL: sagemath#37662 Reported by: David Coudert Reviewer(s):

…neighbors in linear time       As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

…neighbors in linear time       As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

…neighbors in linear time       As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert

cyrilbouvier added the t: bug label Mar 21, 2024

mkoeppe added the c: graph theory label Mar 23, 2024

dcoudert mentioned this issue Mar 24, 2024

Add parameter sort_neighbors to method init_short_digraph #37662

Merged

5 tasks

cyrilbouvier mentioned this issue Jun 7, 2024

New methods for sparse graphs backend to enumerate neighbors in linear time #38167

Merged

3 tasks

vbraun closed this as completed in #37662 Jun 9, 2024

mkoeppe added this to the sage-10.4 milestone Jun 9, 2024

cyrilbouvier mentioned this issue Jun 11, 2024

Improve complexity comments for graphs #38198

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lex_BFS method of Graph is not linear as written in the documentation #37642

lex_BFS method of Graph is not linear as written in the documentation #37642

cyrilbouvier commented Mar 21, 2024 •

edited

Loading

dcoudert commented Mar 24, 2024

dcoudert commented Mar 24, 2024

cyrilbouvier commented Apr 2, 2024

lex_BFS method of Graph is not linear as written in the documentation #37642

lex_BFS method of Graph is not linear as written in the documentation #37642

Comments

cyrilbouvier commented Mar 21, 2024 • edited Loading

Steps To Reproduce

Expected Behavior

Actual Behavior

Additional Information

Environment

Checklist

dcoudert commented Mar 24, 2024

dcoudert commented Mar 24, 2024

cyrilbouvier commented Apr 2, 2024

cyrilbouvier commented Mar 21, 2024 •

edited

Loading