____ _ _ _
| _ \ __ _ | |_ __ _ | | __ _ __| |
| | | | / _` | | __| / _` | | | / _` | / _` |
| |_| | | (_| | | |_ | (_| | | |___ | (_| | | (_| |
|____/ \__,_| \__| \__,_| |_____| \__,_| \__,_|
Read me
DataLad makes data management and data distribution more accessible. To do that, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals and exposing it in readily usable form as Git(-annex) repositories, so-called datasets. The actual data storage and permission management, however, remains with the original data providers.
The full documentation is available at http://docs.datalad.org and http://handbook.datalad.org provides a hands-on crash-course on DataLad.
A number of extensions are available that provide additional functionality for DataLad. Extensions are separate packages that are to be installed in addition to DataLad. In order to install DataLad customized for a particular domain, one can simply install an extension directly, and DataLad itself will be automatically installed with it. Here is a list of known extensions:
-
crawler -- tracking web resources and automated data distributions
-
neuroimaging -- neuroimaging research data and workflows
-
container -- support for containerized computational environments
-
webapp -- support for exposing selected DataLad API as REST API webapp [tech demo]
The documentation of this project is found here: http://docs.datalad.org
All bugs, concerns and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad/issues
If you have a problem or would like to ask a question about how to use DataLad,
please submit a question to
NeuroStars.org
with a datalad
tag. NeuroStars.org is a platform similar to StackOverflow
but dedicated to neuroinformatics.
All previous DataLad questions are available here: http://neurostars.org/tags/datalad/
On Debian-based systems, we recommend to enable NeuroDebian from which we provide recent releases of DataLad. Once enabled, just do:
apt-get install datalad
conda install -c conda-forge datalad
will install released released version, and release candidates are available via
conda install -c conda-forge/label/rc datalad
Before you install this package, please make sure that you install a recent
version of git-annex. Afterwards,
install the latest version of datalad
from
PyPi. It is recommended to use
a dedicated virtualenv:
# create and enter a new virtual environment (optional)
virtualenv --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate
# install from PyPi
pip install datalad
By default, installation via pip installs core functionality of datalad
allowing for managing datasets etc. Additional installation schemes
are available, so you could provide enhanced installation via
pip install datalad[SCHEME]
where SCHEME
could be
tests
to also install dependencies used by unit-tests battery of the dataladfull
to install all dependencies.
There is also a Singularity container available. The latest release version can be obtained by running:
singularity pull shub://datalad/datalad
More details on installation and initial configuration could be found in the DataLad Handbook: Installation.
MIT/Expat
See CONTRIBUTING.md if you are interested in internals or contributing to the project.
DataLad development is supported by a US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411). Additional support is provided by the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform. This work is further facilitated by the ReproNim project (NIH 1P41EB019936-01A1).