Fast and memory efficient library for large read-only graphs.
By exploiting CSR Sparse matrices, we can efficiently pack and traverse graphs in memory (as long as we don't add/delete nodes often).
pip install csrgraph
You can read a graph from an edgelist:
import csrgraph as cg
G = cg.read_edgelist("path_to_file.csv", directed=False, sep=',')
# Node names are stored and accessible
G['cat']
Other objects that can be put in the csrgraph constructor:
- NetworkX Graph. CSRGraph will support edge weights natively.
- Numpy dense matrix Should be square, one row/column per node.
- CSR Matrix. Same as above. This also supports the (data, indices, indptr) format.
Currently, we support very fast random walks with G.random_walks()
. The GloVe, GraRep and GGVec graph embedding algorithms are also directly accessible from the graph object.
You can access the underlying scipy.sparse.csr_matrix
with the .mat
accessor.
import csrgraph as cg
G = cg.csrgraph(data)
G.mat # This is the underlying scipy.sparse.csr_matrix
All the procedures in scipy csgraph
module here will function directly on the G.mat
object.
All graphs are directed. We support undirected graphs by adding "return edges" on each edge. The only issue is that this doubles the number of edges. You'll generally find CSRGraph's efficient memory use and fast methods makes up for this design.
4.2B Node limit. You can't currently have more nodes than UINT32_MAX
(around 4.2billion) -- you'll run out of node IDs. You can have as many edges as will fit in RAM however.
Only float edge weights Eventually we might support complex edge weight objects, but for now we only support 32bit floats.
Accessing nodes by their name is O(log2(n)) Internally, CSRGraphs' map of node names is a (binary searched) sorted array indexing into node ID. Since node names are accessed much more often than than node IDs by name, this is a worthy tradeoff, but it may be unexpected if you heavily use the G[node_name]
operator.
To test, run:
clear && python -m unittest discover tests -f