Skip to content
This repository has been archived by the owner on Nov 29, 2021. It is now read-only.

Cofiguration_files

tdayris-perso edited this page Jan 6, 2021 · 3 revisions

Configuration

Two configuration files are required for this pipeline:

- `config.yaml` contains command line arguments, reference paths, and system options. This is a [yaml](https://en.wikipedia.org/wiki/YAML) file.
- `design.tsv` contains sample's identifiers and paths.

We suggest that you use provided scripts to build configuration files, and then modify them if needed. Most of the time, these scripts will be enough for you. Just look at:

- `rna-dge-salmon-deseq2.py config --help`
- `rna-dge-salmon-deseq2.py design --help`

However, if you want to, you can build them manually: every single part of these files are described below.

Automatic configuration building

config.yaml

Use rna-dge-salmon-deseq2.py, located in the main folder, to automatically build the config.yaml file. It is a command line interfaced python 3 script, which you'll be able to run as soon as you activated the provided conda environment.

# Activate conda environment
conda activate rna-dge-salmon-deseq2

# Get help about command line arguments
python3.8 /path/to/rna-dge-salmon-deseq2/rna-dge-salmon-deseq2.py config --help

Please find below some usage examples:

# Whole pipeline
python3 /path/to/rna-dge-salmon-deseq2/rna-dge-salmon-deseq2.py config /path/to/gtf/annotation/

# Test this script
make config-tests

design.tsv

Use rna-dge-salmon-deseq2.py, located in the main folder, to automatically build the design.tsv file. It is a command line interfaced python 3 script, which you'll be able to run as soon as you activated the provided conda environment.

# Activate conda environment
conda activate rna-dge-salmon-deseq2

# Get help about command line arguments
python3.8 /path/to/rna-dge-salmon-deseq2/rna-dge-salmon-deseq2.py design --help

Please find below some usage examples:

# Single ended reads example:
python3.8 /path/to/rna-dge-salmon-deseq2/rna-dge-salmon-deseq2.py design /path/to/salmon/quant/results/

Detailed content of the config.yaml

Note: a GTF formatted genome annotation is required, since we start from a salmon quantification.

This is a yaml file. The following keys are required (in any order):

cold_storage:
  - Path to cold storage mount point n°1
  - Path to cold storage mount point n°2
  - ...
config: Path to this config file
design: Path to design file
models:
  <Comparison_name>:
    denominator: Reference condition
    factor: Column in the experimental design file (for plotting convenience)
    formula: R statistical formula (used in DESeq2)
    numerator: Tested condition
params:
  DESeq2_extra: Extra parameters for `DESeq2::Deseq2()`
  copy_extra: Extra parameters for bash `cp`
  limmaquickpca2go_extra: Extra parameters for `pcaexplorer::limmaquickpca2go()`
  pca_axes_depth: Maximum number of PCA axes to plot.
  pcaexplorer_distro_expr: Extra parameters for `pcaexplorer::distro_expr()`
  pcaexplorer_pair_corr: Extra parameters for `pcaexplorer::pair_corr()`
  pcaexplorer_pcacorrs: Extra parameters for `pcaexplorer::pcacorrs()`
  pcaexplorer_scree: Extra parameters for `pcaexplorer::pcascree()`
  tximport_extra: Extra parameters for `tximport::tximport()`
pipeline:
  additional_figures: Plot additional figures, or don't
  deseq2: Run DESeq2, or don't
  gseaapp: Subset and prepare TSV files for GSEAapp, or don't.
  multiqc: Run MultiQC with additional integrated figures, or don't.
  pca_explorer: Run PCA-Explorer ... Or don't.
ref:
  gtf: Path to a GTF file
singularity_docker_image: Used docker/singularity image (must contain conda)
threads: Maximum number of threads used
thresholds:
  alpha_threshold: Alpha risk threshold on DESeq2 plots
  fc_threshold: Fold change threshold on DESeq2 plots
workdir: Path to working directory

Detailed content of the design.tsv

This is a TSV file describing our analysis. The column order is not relevant. If you want to build it manually, use your favorite tabular-file editor.

It must contain the following columns:

* Sample_id: the name of each samples
* Salmon: path to the quantification directory

The optional columns are:

* Any experimental design factor you might want
* Any other information

An minimal example would be:

Sample_id Salmon Factor1
Sample 1 /path/to/salmon/quant/sample1/ A
Sample 2 /path/to/salmon/quant/sample1/ B