-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add a datasets
API and cleanup existing datasets
#1348
Comments
datasets
API and cleanup existing datasets
Some tasks for this project:
|
I fleshed out the |
Some requirements for this API:
Example of use:
|
Status update:
|
In today's sync, we agreed on ensuring that the metadata and configuration |
Today, we decided to revamp the usage of the "path" field within the |
Status update:
|
As of now, the following datasets are supported by the
|
Status Update:
The download location hierarchy:
|
…es (#2367) Addresses issue [#1348](https://nvidia.slack.com/archives/C01SCT7ELMR). A working version of the datasets API has been added under the "experimental" module of cuGraph. This API comes with the ability to import a handful of built-in datasets to create graphs and edge lists. Each dataset comes with its own metadata file in the format of a YAML file. These files contain general information about the dataset, as well as formatting information about their columns and datatypes. Authors: - Dylan Chima-Sanchez (https://github.com/betochimas) - Ralph Liu (https://github.com/oorliu) Approvers: - Rick Ratzel (https://github.com/rlratzel) - Joseph Nke (https://github.com/jnke2016) URL: #2367
#2367 closes this |
CuGraph could benefit from a new
datasets
API that allows users to easily create Graph objects from pre-defined datasets. Currently, users have to read a file using cuDF to create an edgelist, then create a Graph instance from the edgelist. This is a multi-step process that is very common, yet not always easy to remember. Streamlining this very common process will benefit users as well as our tests and benchmarks.The datasets directory also needs to be cleaned up a bit and the README updated.
For all of the above, the Python and C++ test and benchmarks need to be checked to see if they are referencing any dataset that might be a moved or renamed as part of the cleanup.
The text was updated successfully, but these errors were encountered: