Skip to content

patrick-g-h/conp-dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONP dataset

CONP dataset is a repository containing the datasets available in the Canadian Open Neuroscience Platform. It leverages DataLad to store metadata and references to data files distributed in various storage spaces and accessible depending on each data owner's policy.

Dataset structure

The dataset is structured as follows:

  • investigators contains sub-datasets for investigators based in Canada.
  • projects contains sub-datasets for projects hosted in Canada.

Investigators and projects are responsible for the management and curation of their own sub-datasets.

Accessing data

Requirements:

  • Git
  • Git annex
  • DataLad: pip install git+https://github.com/datalad/datalad.git

To start, install the main CONP dataset on your computer:

datalad install -r http://github.com/CONP-PCNO/conp-dataset

Get the files you are interested in:

datalad get <file_name>

This may require authentication depending on the data owner's configuration.

You can also search for relevant files and sub-datasets as follows:

datalad search T1

Adding data

If you are an investigator or a project manager, you can create a sub-dataset in the CONP repository as follows:

  1. Fork the CONP data repository on GitHub:

  2. Install your fork on your computer:

datalad install [email protected]:<username>/conp-dataset
  1. Create your sub-dataset in your cloned fork, under investigators or projects. For instance:
datalad create -d . investigators/<username>
  1. Publish your sub-dataset:

    From the main repository (conp-dataset):

    a. Add a sibling for your dataset on GitHub:

    datalad create-sibling-github -d investigators/<username> conp-dataset-<username>

    DataLad will ask your GitHub user name and password to create the sibling.

    b. Update the .gitmodules file to add your sibling. It should contain a section that looks like this:

    [submodule "investigators/<username>"]
        path = investigators/<username>
        url = http://github.com:<username>/conp-dataset-<username>.git
    

    Note the Git endpoint in the url.

  2. Add files to your sub-dataset:

    From your sub-dataset (investigators/<username>):

    a. Create and add a README.md file, directly in the Git repository:

    datalad add --to-git ./README.md

    b. Add a file accessible through http (for instance an image file):

    git annex addurl <url> --file <local_path>

    c. Publish the modifications:

    datalad save
    datalad publish --to github
  3. Publish the modifications to your fork of the main dataset:

    From the main repository (conp-dataset):

    datalad save
    datalad publish --to origin
  4. Publish modifications to the main dataset:

    Create a new pull request from http://github.com:username/conp-dataset to http://github.com/CONP-PCNO/conp-dataset.

    TODO: add a screenshot here.

Once the pull request is accepted by the CONP data managers, your dataset is created in the CONP repository. It is then up to you to manage its content and decide on the creation of sub-datasets in it. Modifications to your dataset can be propagated to the CONP dataset through pull requests, by repeating the last step above.

We welcome your feedback! 😃

About

A DataLad dataset for CONP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published