Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs fixes and improvements #102

Merged
merged 7 commits into from
Jun 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@
},
{
"name": "Guenther, Nick"
},
{
"affiliation": "Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany",
"name": "Appelhoff, Stefan",
"orcid": "0000-0001-8002-0877"
}
],
"keywords": [
Expand Down
12 changes: 9 additions & 3 deletions datalad_osf/create_sibling_osf.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,24 @@

@build_doc
class CreateSiblingOSF(Interface):
"""Create a dataset representation at OSF
"""Create a dataset representation at OSF.

This will create a project on OSF and initialize
an osf special remote to point to it. There are two modes
this can operate in: 'annex' and 'export'.
The former uses the OSF project as a key-value store, that
can be used to by git-annex to copy data to and retrieve
can be used by git-annex to copy data to and retrieve
data from (potentially by any clone of the original dataset).
The latter allows to use 'git annex export' to publish a
snapshot of a particular version of the dataset. Such an OSF
project will - in opposition to the 'annex' - be
human-readable.

For authentification with OSF, you can define environment variables: Either
'OSF_TOKEN', or both 'OSF_USERNAME' and 'OSF_PASSWORD'. If neither of these
is defined, the tool will fall back to the datalad credential manager and
inquire for credentials interactively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: When used with DataLad, it supports queries of DataLad's credential management and makes the definition of environment variables unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #95

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does FTR mean? Should I replace my sentence with yours?

"""

result_renderer = 'tailored'
Expand All @@ -68,7 +74,7 @@ class CreateSiblingOSF(Interface):
),
name=Parameter(
args=("-s", "--name",),
doc="""name of the to-be initialized osf-special-remote""",
doc="""Name of the to-be initialized osf-special-remote""",
constraints=EnsureStr()
),
mode=Parameter(
Expand Down
2 changes: 1 addition & 1 deletion datalad_osf/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def get_credentials(allow_interactive=True):
token_auth = Token(name='https://osf.io', url=None)
up_auth = UserPassword(name='https://osf.io', url=None)

# get auth token, form environment, or from datalad credential store
# get auth token, from environment, or from datalad credential store
# if known-- we do not support first-time entry during a test run
token = environ.get(
'OSF_TOKEN',
Expand Down
25 changes: 10 additions & 15 deletions docs/source/exportdatacode.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
.. include:: ./links.inc

Export version-controlled data to OSF and code to GitHub
********************************************************

Imagine you are a PhD student and want to collaborate on a fun little side
project with a student at another institute. It is quite obvious for the two of
you that your code will be hosted on GitHub_. And you also know enough about
DataLad, that using it for the whole project will be really beneficial.
DataLad_, that using it for the whole project will be really beneficial.

But what about the data you are collecting?
The Dropbox is already full (`DataLad third party providers <http://handbook.datalad.org/en/latest/basics/101-138-sharethirdparty.html>`_). And Amazon services don't seem to be
your best alternative.
Suddenly you remember, that you got an OSF_ account recently, and that there is this nice `Datalad extension <https://github.com/datalad/datalad-osf/>`_ to set up a SpecialRemote on OSF_.
The Dropbox is already full (`DataLad third party providers <http://handbook.datalad.org/en/latest/basics/101-138-sharethirdparty.html>`_).
And Amazon services don't seem to be your best alternative.
Suddenly you remember, that you got an OSF_ account recently, and that there is this nice `Datalad extension <https://github.com/datalad/datalad-osf/>`_ to set up a `Special Remote`_ on OSF_.

Walk through
------------
Expand All @@ -27,16 +28,16 @@ For installation checkout the installation page of the documentation.
Creating an Example Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^

As a very first step you want to set up a DataLad Dataset. For this you should
As a very first step you want to set up a DataLad dataset. For this you should
run. In all examples a `$` in front indicates a new line in the Bash-Shell,
copying it will prevent your code from execution.

.. code-block:: bash

$ datalad create collab_osf

After having created the dataset we want to populate it with some content (just
like in the Handbook). Importantly we don't want to upload this file on GitHub, only on OSF - in the real world this could be your data that is too large to upload to GitHub.
After having created the dataset we want to populate it with some content (just like in the `DataLad Handbook`_).
Importantly we don't want to upload this file on GitHub, only on OSF - in the real world this could be your data that is too large to upload to GitHub.

.. code-block:: bash

Expand All @@ -52,11 +53,10 @@ And we also want to add a text file, which will be saved on GitHub_ - in your ca

$ mkdir code
$ cd code
$ echo "This is just an example file just to show the different ways of saving data in a DataLad Dataset." > example.txt
$ echo "This is just an example file just to show the different ways of saving data in a DataLad dataset." > example.txt
$ datalad save --to-git -m "created an example.txt"

We now have a Dataset with one file that can be worked on using GitHub and one
that should be tracked using `git-annex`.
We now have a dataset with one file that can be worked on using GitHub and one that should be tracked using `git-annex`.

Setting up the OSF Remote
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -85,8 +85,3 @@ We can set-up a GitHub Remote with name `github` and include a publish dependenc
$ datalad publish . --to github --transfer-data all

This will publish example.txt in code/ to GitHub and only add the folder structure and symbolic links for all other file; at the same time it will upload the data to OSF - this way you can let OSF handle your data and GitHub your code.



.. _OSF: https://www.osf.io/
.. _GitHub: https://www.github.com/
34 changes: 26 additions & 8 deletions docs/source/exporthumandata.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. include:: ./links.inc

Export a human-readable dataset to OSF
**************************************

Imagine you have been creating a reproducible workflow using DataLad from the
Imagine you have been creating a reproducible workflow using DataLad_ from the
get go. Everything is finished now, code, data, and paper are ready. Last thing
to do: Publish your data.

Expand All @@ -21,7 +23,7 @@ For installation checkout the installation page of the documentation.

Creating an Example Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^
We will create a small example DataLad Dataset to show the functionality.
We will create a small example DataLad dataset to show the functionality.

.. code-block:: bash

Expand All @@ -31,7 +33,7 @@ We will create a small example DataLad Dataset to show the functionality.
# Copying the $ will prevent your code from execution.

After having created the dataset we want to populate it with some content (just
like in the Handbook):
like in the `Datalad Handbook`_):

.. code-block:: bash

Expand All @@ -44,21 +46,37 @@ like in the Handbook):
Setting up the OSF Remote
^^^^^^^^^^^^^^^^^^^^^^^^^

To use OSF as a storage, you need to provide either your OSF credentials or an OSF access token.
You can create such a token in your account settings (`Personal access token` and then `Create token`), make sure to create a `full_write` token to be able to create OSF projects and upload data to OSF.
To use OSF as a storage, you first need to provide either your OSF credentials (username and password) or an OSF access token.

If you choose to use your credentials, proceed as follows:

.. code-block:: bash

export OSF_USERNAME=YOUR_USERNAME_FOR_OSF.IO
export OSF_PASSWORD=YOUR_PASSWORD_FOR_OSF.IO

In this example, we are going to use an OSF access token instead.
You can create such a token in your account settings (`Personal access token` and then `Create token`).
Make sure to create a `full_write` token to be able to create OSF projects and upload data to OSF.

.. code-block:: bash

export OSF_TOKEN=YOUR_TOKEN_FROM_OSF.IO

We are now going to use datalad to create a sibling dataset on OSF with name `osf` - this will create a new project called `OSF_PROJECT_NAME` on the OSF account associated with the OSF token in `$OSF_TOKEN`.
We are now going to use datalad to create a sibling dataset on OSF with name `OSF_PROJECT_NAME`.
This will create a new project called `OSF_PROJECT_NAME` on the OSF account associated with the OSF token in `$OSF_TOKEN`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rational to switch from osf to OSF_PROJECT_NAME in the example. Do you envision the need to have multiple different OSF projects for the same dataset as a common case?

Having something simple and uniform, such as osf makes a lot of sense. Especially in the case of a hierarchy of nested datasets, where one would want to be able to do a datalad push --to osf --recursive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision the need to have multiple different OSF projects for the same dataset as a common case?

no, I don't think so,

But I would prefer to let users know (also within the example) that it's up to their discretion which name they want to use for their remote and the OSF project. --> after all, it doesn't matter whether it's called osf or something else.

But reading this again, it seems like I mixed up some things and it rather should be something like:

We are now going to use DataLad to create a sibling dataset on OSF as a "special remote". Within git-annex, we will refer to the special remote with the name $NAME_OF_REMOTE, while the project that will be created on the OSF account associated with the $OSF_TOKEN will be called $OSF_PROJECT_NAME.


Note that the ``-s NAME_OF_REMOTE>`` flag is used to specify how ``git`` internally refers to your OSF project with the name `OSF_PROJECT_NAME`.
It would be completely fine to use `OSF_PROJECT_NAME` also as a value for the ``-s`` flag.

You can later on list your remotes from the command line using the ``git remote -v`` command.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM this names refers to the special remote, not a Git remote IIRC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still listed by git remote -v. Just w/o any details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a command to list special remotes that I should refer to instead?

Should I adjust the text to make it more clear that this is not a normal remote, but a special remote?


.. code-block:: bash

$ datalad create-sibling-osf -s osf OSF_PROJECT_NAME --mode export
$ datalad create-sibling-osf -s NAME_OF_REMOTE OSF_PROJECT_NAME --mode export

After that we can export the current state (the `HEAD`) of our dataset in human readable form to OSF:

.. code-block:: bash

git annex export HEAD --to YOUR_OSF_REMOTE_NAME
git annex export HEAD --to NAME_OF_REMOTE
12 changes: 7 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
.. include:: ./links.inc

DataLad extension to interface with OSF
***************************************

This extension enables DataLad to work with the Open Science Framework (OSF). Use it to publish your dataset's data to an OSF project to utilize the OSF for dataset data storage and easy dataset sharing.
This extension enables DataLad_ to work with the Open Science Framework (OSF_).
Use it to publish your dataset's data to an OSF project to utilize the OSF for dataset data storage and easy dataset sharing.

The extension was created during the OHBM Hackathon 2020.

If you have any questions, comments, bug fixes or improvement suggestions, feel free to contact us via our `Github page <https://github.com/datalad/datalad-osf>`_. Before contributing, be sure to read the contributing guidelines.
If you have any questions, comments, bug fixes or improvement suggestions, feel free to contact us via our `Github page <https://github.com/datalad/datalad-osf>`_.
Before contributing, be sure to read the `contributing guidelines <https://github.com/datalad/datalad-osf/blob/master/CONTRIBUTING.md>`_.


.. toctree::

Documentation
=============

.. toctree::
.. toctree::
:maxdepth: 2

intro
Expand Down Expand Up @@ -64,5 +68,3 @@ Indices and tables
* :ref:`search`

.. |---| unicode:: U+02014 .. em dash

.. _OSF: http://www.osf.io/
13 changes: 9 additions & 4 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
.. include:: ./links.inc

Introduction
------------

Goal of the extension
^^^^^^^^^^^^^^^^^^^^^

This extension aims to allow DataLad to work with the Open Science Framework (OSF). This is done by transforming storage on the Open Science Framework (OSF) into a `git-annex <https://git-annex.branchable.com/>`_ repository.
This extension aims to allow DataLad_ to work with the Open Science Framework (OSF_).
This is done by transforming storage on the Open Science Framework (OSF) into a `git-annex`_ repository.

What can I use this extension for?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can use this extension to use the OSF as a special remote to store data in the annex of a dataset. With this, you can `datalad publish` a dataset to GitHub or similar services and the data to the OSF (via a publication dependency).
The extension is most beneficial for easy access to data stored on OSF via GitHub. If you are sharing your dataset via OSF and code via GitHub, this will allow smooth integration of both along with unified version management provided by DataLad.
You can use this extension to use the OSF as a special remote to store data in the annex of a dataset.
With this, you can `datalad publish` a dataset to GitHub or similar services and the data to the OSF (via a publication dependency).
The extension is most beneficial for easy access to data stored on OSF via GitHub.
If you are sharing your dataset via OSF and code via GitHub, this will allow smooth integration of both along with unified version management provided by DataLad.

What can I **not** use this extension for?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This tool does not work for data that is stored in a storage service other than OSF.
Please refer to the `list of special remotes <https://git-annex.branchable.com/special_remotes/>`_ as hosted by the git-annex website for other storage services.
Please refer to the list of `special remotes`_ as hosted by the git-annex website for other storage services.
20 changes: 20 additions & 0 deletions docs/source/links.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. This (-*- rst -*-) format file contains commonly used link targets
and name substitutions. It may be included in many files,
therefore it should only contain link targets and name
substitutions. Try grepping for "^\.\. _" to find plausible
candidates for this list.

.. NOTE: reST targets are
__not_case_sensitive__, so only one target definition is needed for
nipy, NIPY, Nipy, etc...


.. _DataLad: https://www.datalad.org
.. _DataLad Handbook: http://handbook.datalad.org/en/latest/
.. _GitHub: https://www.github.com/
.. _git-annex: git-annex.branchable.com/
.. _git: git-scm.com/
.. _OSF: https://www.osf.io/
.. _Python: https://www.python.org/
.. _Special Remote: https://git-annex.branchable.com/special_remotes/
.. _Special Remotes: https://git-annex.branchable.com/special_remotes/
6 changes: 4 additions & 2 deletions docs/source/settingup.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. include:: ./links.inc

Setting up
==========

Expand All @@ -6,7 +8,7 @@ Requirements

- DataLad

Before being able to use the extension, you need to have DataLad installed, which relies on `git-annex <git-annex.branchable.com/>`_, `git <git-scm.com/>`_ and `Python <https://www.python.org/>`_.
Before being able to use the extension, you need to have DataLad installed, which relies on `git-annex`_, `git`_ and `Python`_.
If you don't have DataLad installed yet, please follow the instructions from `the datalad handbook <http://handbook.datalad.org/en/latest/intro/installation.html>`_.

- An account on the OSF
Expand All @@ -15,7 +17,7 @@ You need an OSF account to be able to interact with it. If you don't have an acc

- An account on a git repository hosting site

You should consider having an account on one or more repository hosting sites such as `GitHub <https://github.com/join>`_ , `GitLab <https://gitlab.com/users/sign_up>`_, `Bitbucket <https://bitbucket.org/account/signup/>`_ or similar"
You should consider having an account on one or more repository hosting sites such as `GitHub <https://github.com/join>`__ , `GitLab <https://gitlab.com/users/sign_up>`_, `Bitbucket <https://bitbucket.org/account/signup/>`_ or similar"

Installation
------------
Expand Down