Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project-to-DAG runtime #205

Open
cnellington opened this issue Jul 14, 2023 · 7 comments
Open

Project-to-DAG runtime #205

cnellington opened this issue Jul 14, 2023 · 7 comments

Comments

@cnellington
Copy link
Owner

Currently we do DAG projection by binary search O(logn) over sorted weights O(nlogn), checking if the graph where weights > threshold is a DAG via igraph is_dag O(n*n). The igraph implementation gets a spanning tree to check for dagness. This is prohibitively slow for large DAGs.

Can we speed this up? Maybe by using spanning trees to construct a new graph.

@cnellington
Copy link
Owner Author

cnellington commented Jul 14, 2023

Toy example to greedily build a topological order by maximizing the sum of outgoing edge weight magnitudes at each level. Projects 1e4 random fully-connected graphs with 50 nodes each in about 1 min.

import numpy as np
n = 10
mat = np.random.normal(0, 1, (n, n))
print(np.sum(np.triu(np.abs(mat))))
nodes_to_order = set(range(n))
order = []
for i in range(n):
    max_score = 0
    max_node = -1
    for v in nodes_to_order:
        score = np.sum(np.abs(mat[v, i:]))
        if score > max_score:
            max_score = score
            max_node = v 
    nodes_to_order.remove(max_node)
    order.append(max_node)
print(order)
print(np.sum(np.triu(np.abs(mat[order]))))

@cnellington
Copy link
Owner Author

While this maximizes the sum of child edges as each level, this algorithm would also place a root node with outdegree 1 last in the topological order when it would be optimal to put this node first. Instead we could greedily select root-like nodes that have the smallest sum of incoming edges, since we know there would be another node that would add more to the total edge weight sum as a child of other nodes.

import numpy as np
n = 10
mat = np.random.normal(0, 1, (n, n))
nodes_to_order = set(range(n))
order = []
for i in range(n):
    min_score = 0
    min_node = -1
    for v in nodes_to_order:
        score = np.sum(np.abs(mat[i:, v]))  # reversed to look at incoming edge sum, how much score do we lose by selecting v
        if score < min_score:
            min_score = score
            min_node = v 
    nodes_to_order.remove(min_node)
    order.append(min_node)
print(np.sum(np.triu(np.abs(mat))))
print(order)
print(np.sum(np.triu(np.abs(mat[order]))))

@szhorvat
Copy link

I came across this by accident, and I'm likely misunderstanding what you are saying ...

The igraph implementation gets a spanning tree to check for dagness. This is prohibitively slow for large DAGs.

But I wanted to point out that the linear-time algorithm igraph uses to test if a graph is a DAG does not involve the construction of any spanning trees

@cnellington
Copy link
Owner Author

Thanks for clarifying @szhorvat!

@blengerich blengerich linked a pull request Jul 15, 2023 that will close this issue
@blengerich blengerich removed a link to a pull request Jul 15, 2023
@blengerich
Copy link
Collaborator

Thanks @szhorvat; btw how did you find this project?

@szhorvat
Copy link

szhorvat commented Jul 15, 2023

Can you clarify if there is an issue with the performance of igraph's DAG testing?

The current implementation is based on removing source (i.e. zero in-degree) vertices until none are left. If the entire graph was removed, then it was a DAG.

An alternative is performing graph traversal (DFS or BFS) to look for cycles. This is not implemented at the moment.

Both are linear-time in the worst case, i.e. when the graph is acyclic, they will both explore all edges. It's unclear which one would be faster in practice. It might depend on the structure of the graph (does it have cycles?). More likely, the performance difference depends on hardware-related non-algorithmic factors, so this needs benchmarking.

At some point I might implement a DFS-based function that doesn't just check if the graph is acyclic, but provides proof, i.e. returns a cycle if there is one.

We always welcome feedback about how people use igraph and what their needs are. Feel free to contact us on the forum https://igraph.discourse.group/

@cnellington
Copy link
Owner Author

Based on what you've said, it sounds like the issue is not igraph's runtime but I'll continue to update the issue this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants