Skip to content

Commit

Permalink
docs(changes-to-home.md,-usage.md,-specs.md,-installation.md): added …
Browse files Browse the repository at this point in the history
…some more details to installation.md, added some figures to index.md, changed specs.md to be more about all file specs and added links betw usage and specs files
  • Loading branch information
JRWallace committed Aug 20, 2019
1 parent fb7cf06 commit 84ba247
Show file tree
Hide file tree
Showing 4 changed files with 128 additions and 62 deletions.
7 changes: 5 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@

---

### `RefChef` is a reference management system that includes additional tools to record the provenance of reference sequences, indices, and annotations. It was created to enable reproducible research.

`RefChef` is a reference management system that includes additional tools to record the provenance of reference sequences, indices, and annotations. It was created to enable reproducible research.
---

`RefChef` will:

1. Document the exact steps undertaken in the retrieval and processing of genomic references
2. Maintain the associated metadata
3. Provide a mechanism for automatically reproducing retrieval and creation of an exact copy of genomic references
3. Provide a mechanism for automatically reproducing retrieval and creation of an exact copy of genomic references

![Diagram](assets/refchef_overview.svg)
11 changes: 9 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Run unit tests as:
`python setup.py test`

### Set up GitHub Access Token and `.env` file
RefChef uses Git and GitHub for version control of the `master.yaml` file that contains a list of all the references on the system. To use RefChef, create a GitHub account and set up an [access token](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line).
RefChef uses Git and GitHub for version control of the `master.yaml` file, which contains a list of all the references on the system and their provenance. To use RefChef, create a GitHub account and set up an [access token](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line).
![](assets/github_token.png)

Additionally, create a [`.gitignore` file](https://help.github.com/en/articles/ignoring-files)...
Expand All @@ -38,7 +38,14 @@ Now create a `.env` file...
touch .env
```

...and paste the GitHub access token into the `.env` file and the `.env.template` file in the `RefChef` home directory.
... and paste the contents of the `.env.template` file in the `RefChef` home directory into the `.env` file, which will now look like this:

```bash
GITHUB_TOKEN=
```

Then, paste the GitHub access token into the `GITHUB_TOKEN=` line copied over from the `env.template` file. For example, your `.env` file might now look like this:

```bash
GITHUB_TOKEN=5c25370fcf7db4a676d98d72700e2922654485ed
```
Expand Down
170 changes: 113 additions & 57 deletions docs/specs.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,127 @@
# Specifications for `master.yaml`
---
```yaml
reference_test1:
metadata:
name: reference_test1
species: mouse
organization: ucsc
downloader: fgelin
levels:
references:
- component: primary
complete:
status: false
commands:
- wget -nv https://s3.us-east-2.amazonaws.com/refchef-tests/chr1.fa.gz
- md5 *.fa.gz > postdownload_checksums.md5
- gunzip *.gz
- md5 *.fa > final_checksums.md5
```
The `master.yaml` file is the main source of information that RefChef uses to retrieve references, indices, and annotations.

### Specifications
# Specifications for `master.yaml` <a name="master.yaml"></a>

The `master.yaml` file is the main source of information that RefChef uses to retrieve references, indices, and annotations. It is composed of code blocks that each contain three distinct sections:

1. key
2. metadata
3. levels

For example:

![Diagram](assets/yamlsections.svg)

See the [`master.yaml` overview and usage](./usage.md#master.yaml) for more information.

Each block starts with a `key`, which should be <reference_name\> (the name of the reference).

The `metadata` section consists of:

>`metadata.name`
>Expected format: <reference_name\> string, should be the same as the block's `key`
>`metadata.common_name`
>Expected format: string
>`metadata.ncbi_taxon_id`
>Expected format: integer, based on [NCBI taxon ID](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi)
>`metadata.organism`
>Expected format: string
>`metadata.organization`
>Expected format: string
>`metadata.custom`
>Expected format: string
>`metadata.description`
>Expected format: string
>`metadata.downloader`
>Expected format: string
>`metadata.ensembl_release_number`
>Expected format: integer
>>`metadata.accession.genbank`
>>Expected format: string
>>`metadata.accession.refseq`
>>Expected format: string
The `levels` section consists of:

>`levels.<type>`
>Where <type\>: `references`, `annotations`, or `indices`
>>`levels.<type>.- component`
>>Expected format: string
>>>`levels.<type>.complete.status`
>>>Expected format: boolean (note that if `complete.status` is set to `true` RefChef will skip the current block and not retrieve any file. RefChef automatically changes the status to `true` after retrieving files for the first time.)
>>`levels.<type>.src`
Expected format: UUID string from existing reference, when adding an index file for a reference RefChef will create a symlink to the index files in the reference folder.

>>`levels.<type>.commands`
Expected format: Each command should start with `- `, this section is a list of commands to download and process each reference.

After RefChef-cook is run and references are downloaded, `levels.<type>.complete.status: false` will change to `levels.<type>.complete.status: true` and the following fields will be added to `master.yaml`

>>>`levels.<type>.complete.time`
>>>Expected format: RefChef will autopopulate this field with the date and time stamp the reference was downloaded if `levels.<type>.complete.status: true`
>>`levels.<type>.location`
Expected format: Refchef will autopopulate this field with the directory where downloaded files are stored if `levels.<type>.complete.status: true`
>>`levels.<type>.files`
Expected format: Refchef will autopopulate this field with a list of files that were downloaded if `levels.<type>.complete.status: true`
>>`levels.<type>.uuid`
Expected format: Refchef will autopopulate this field with a UUID for your reference file if `levels.<type>.complete.status: true`
---

Each block has a key with the name of the reference, index, or annotation.
# Specifications for `cfg.yaml` <a name="cfg.yaml"></a>

If using a `cfg.yaml` file, the `cfg.yaml` file should follow the following specs:

>>`config-yaml.path-settings.reference-directory`
Expected format: String, path to reference storage directory

`reference_name.metadata`
Expected format: key - value mapping
>>`config-yaml.path-settings.git-directory`
Expected format: String, path to local git repository

`reference_name.metadata.name`
Expected format: <reference_name> string, should be the same as the block's key
>>`config-yaml.path-settings.remote-repository`
Expected format: String, remote git repository, should be in the format of `user/repo`

>>`config-yaml.log-settings.log`
Expected format: String, should be either 'yes' or 'no' in single quotes, indicating whether or not log files will be made

Also see the [`cfg.yaml` overview and example.](./usage.md#cfg.yaml)

---
# Specifications for `cfg.ini` <a name="cfg.ini"></a>

`reference_name.metadata.species`
Expected format: string
If using a `cfg.ini` file, the `cfg.ini` file should follow the following specs:

`reference_name.metadata.organization`
Expected format: string
`[path-settings].reference-directory=`
Expected format: String, path to reference storage directory

`reference_name.metadata.downloader`
Expected format: string
`[path-settings].git-directory=`
Expected format: String, path to local git repository

`reference_name.levels`
Expected format: key - value mapping
`[path-settings].remote-repository=`
Expected format: String, remote git repository, should be in the format of `user/repo`

`reference_name.levels.<type>`
Where <type\>: `references`, `annotations`, or `indices`
Expected format: list of key - value mappings
`[log-settings].log=`
Expected format: String, should be either 'yes' or 'no', indicating whether or not log files will be made

> `reference_name.levels.<type>.-`
`[runtime-settings].break-on-error=`
Expected format: String, should be either 'yes' or 'no', indicating how RefChef should respond when encountering an error

> `component`
Expected format: string
`complete.status`
Expected formate: boolean (note that if `complete.status` is set to `true` RefChef will skip the current block and not retrieve any file. RefChef automatically changes the status to true after retrieving files for the first time.)
`src`
Expected format: UUID v4, or string. If a UUID of an existing reference is entered, RefChef will create a symlink to the index files from the reference folder.
`commands`
Expected format: list of strings
`[runtime-settings].verbose=`
Expected format: String, should be either 'yes' or 'no', toggles between verbosity output settings

After RefChef runs and retrieves the files, the following fields will be appended the following fields to `master.yaml`:
Also see the [`cfg.ini` overview and example.](./usage.md#cfg.ini)

>`reference_name.levels.<type>.-`

> `location`
Expected format: string
`files`
Expected format: list of strings
`uuid`
Expected format: UUID v4
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ nav:
- Home: 'index.md'
- Installation: 'installation.md'
- Usage: 'usage.md'
- YAML specs: 'specs.md'
- File specs: 'specs.md'
- RefChef serve: 'serve.md'
- Tutorials:
- QuickStart: tutorials/quickstart.md

0 comments on commit 84ba247

Please sign in to comment.