-
Notifications
You must be signed in to change notification settings - Fork 36
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
GH-448 Add data access page
- Loading branch information
Showing
18 changed files
with
22,519 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
Data Access | ||
=========== | ||
|
||
The electrophysiology data files for the PatchSeq experiments released by the | ||
Allen Institute are stored in `Neurodata Without Borders 2.0 <https://nwb.org>`_ (NWB) format. | ||
The files are hosted on the `Distributed Archives for Neurophysiology Data Integration (DANDI) <https://dandiarchive.org>`_. | ||
|
||
The PatchSeq data release is composed of two archives: | ||
|
||
Mouse data archive (114 GB): `<https://dandiarchive.org/dandiset/000020>`_ | ||
|
||
Human data archive (12 GB): `<https://dandiarchive.org/dandiset/000023>`_ | ||
|
||
Each archive is accompanied by the corresponding file manifest and the experiment metadata tables. | ||
|
||
The file manifest table contains information about the files included in the archive, | ||
their location ("archive_uri" column) and the corresponding cell("cell_specimen_id" column). | ||
The file manifest combines information about several different data modalities (see the "technique" column) | ||
recorded from each cell. The files with the intracellular electrophysiological recordings stored on DANDI are denoted as | ||
"technique" = intracellular_electrophysiology. | ||
|
||
In turn, the experiment metadata table includes information about the experimental conditions | ||
for each cell("specimen_id" column). This table could be used to select the desired cells | ||
satisfying particular experimental conditions. Then, given the desired "specimen_ids", | ||
you can find the corresponding DANDI urls of these data from the file manifest. | ||
|
||
IPFX includes a utility that provides file manifest and experiment data of the published archives. | ||
|
||
For example, to obtain detailed information about Human data archive: | ||
|
||
.. code-block:: python | ||
from ipfx.data_access import get_archive_info | ||
archive_url, file_manifest, experiment_metadata = get_archive_info(dataset="human") | ||
where ``archive_uri`` is the DANDI URL for the Human data, | ||
``file_manifest`` is a pandas.DataFrame of file manifest and | ||
``experiment_metadata`` is a pandas.DataFrame of experiment metadata. | ||
To obtain the same information for the Mouse data, change to `dataset="mouse"` in the function argument. | ||
|
||
You can download data files by directly entering the DANDI's archive_uri in your browser. | ||
Alternatively, a more powerful option is to install DANDI's command line client: | ||
|
||
.. code-block:: bash | ||
pip install dandi | ||
With client installed, you can easily download individual files or an entire archive as: | ||
|
||
.. code-block:: bash | ||
dandi download --output-dir <DIRECTORY> <URL> | ||
where <DIRECTORY> is the existing directory on your file system | ||
and <URL> is the url of a file or an archive. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
import pandas as pd | ||
from typing import Tuple | ||
import os | ||
|
||
PARENT_DIR = os.path.dirname(__file__) | ||
DATA_DIR = os.path.join(PARENT_DIR, "data_release") | ||
ARCHIVE_INFO = pd.DataFrame( | ||
{ | ||
"dataset": ["human", "mouse"], | ||
"size (GB)": [12,114], | ||
"archive_url": [ | ||
"https://dandiarchive.org/dandiset/000023", | ||
"https://dandiarchive.org/dandiset/000020", | ||
], | ||
"file_manifest_path":[ | ||
os.path.join(DATA_DIR, "2020-06-26_human_file_manifest.csv"), | ||
os.path.join(DATA_DIR, "2020-06-26_mouse_file_manifest.csv"), | ||
], | ||
"experiment_metadata_path": [ | ||
os.path.join(DATA_DIR, "20200625_patchseq_metadata_human.csv"), | ||
os.path.join(DATA_DIR, "20200625_patchseq_metadata_mouse.csv") | ||
], | ||
} | ||
).set_index("dataset") | ||
|
||
|
||
def get_archive_info( | ||
dataset: str, | ||
archive_info:pd.DataFrame = ARCHIVE_INFO | ||
)-> Tuple[str, pd.DataFrame, pd.DataFrame]: | ||
""" | ||
Provide information about released archive | ||
Parameters | ||
---------- | ||
dataset : name of the dataset to query. Currently supported options are: | ||
- human | ||
- mouse | ||
archive_info : dataframe of metadata and manifest files for each supported | ||
dataset. Dataset name is the index. | ||
Returns | ||
------- | ||
Information about the archive | ||
""" | ||
|
||
if dataset in archive_info.index.values: | ||
file_manifest_path = archive_info.at[dataset, "file_manifest_path"] | ||
metadata_path = archive_info.at[dataset, "experiment_metadata_path"] | ||
archive_url = archive_info.at[dataset, "archive_url"] | ||
else: | ||
raise ValueError( | ||
f"No archive for the dataset '{dataset}'. Choose from the known " | ||
f"datasets: {archive_info.index.values}" | ||
) | ||
|
||
file_manifest = pd.read_csv(file_manifest_path) | ||
experiment_metadata = pd.read_csv(metadata_path) | ||
return archive_url, file_manifest, experiment_metadata |
Oops, something went wrong.