Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative graph data structures #95

Closed
zietzm opened this issue May 11, 2018 · 5 comments
Closed

Alternative graph data structures #95

zietzm opened this issue May 11, 2018 · 5 comments

Comments

@zietzm
Copy link
Collaborator

zietzm commented May 11, 2018

We are considering creating another base representation of hetnets. One of the main goals is to facilitate faster network loading, which at present can take over a minute and a half to load a graph.

The following are under consideration:

  • Mambo, which relies on SNAP.
  • GraphFrames, a graph package for Apache Spark. Advantages of this framework are that metadata and multiple edge types are supported in a logical way, Spark scales well in case we eventually make hetmech a back-end server application, and the data structure is highly-used. However, we would have to re-implement the DWPC in a way that either computes exact path-level results (path and corresponding DWPC) more slowly than the current matrix implementation or computes path-level DWPC only.
  • HetMat, what would be an internally-produced matrix-first representation of the network. This option would allow us to store the entire Hetionet-v1.0 and five permutations in about 30 MB as scipy sparse matrices in .npz format. Moreover, unlike the other methods, we would likely not have to change more than a few functions in order to load on-disk adjacency matrices.
@dhimmel
Copy link
Collaborator

dhimmel commented May 14, 2018

Another package of interest could be xarray, which provides "N-dimensional variants of the core pandas data structures." I think it could be ideal for matrix based representations of hetnets. However currently it can only use numpy arrays as its backend and lacks scipy.sparse support. Therefore it's probably not appropriate at this time.

Check out 2.xarray.ipynb where we encode Hetionet v1.0 as an xarray.

@dhimmel
Copy link
Collaborator

dhimmel commented May 14, 2018

The following products are related and may be good for storing dataframes and potentially matrices on disk:

@dhimmel
Copy link
Collaborator

dhimmel commented May 15, 2018

For storing nodes for hetmech, I'm thinking we should use sqlite, to enable fast lookup of node positions from names. However, not sure if we should do this now in #97 or later.

@zietzm
Copy link
Collaborator Author

zietzm commented May 17, 2018

Closed by #97

@dhimmel
Copy link
Collaborator

dhimmel commented Jun 11, 2018

multinetx

I recently came across multinetx (GitHub), which is a:

python package for the manipulation and visualization of multilayer networks. The core of this package is a MultilayerGraph, a class that inherits all properties from networkx.Graph().

I don't think we have any use for this package at the moment, but I wanted to note it here, so we can keep an eye on its development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants