Skip to content

Commit

Permalink
Merge pull request #204 from MaxUlysse/ControlFREEC
Browse files Browse the repository at this point in the history
Control-FREEC improvements
  • Loading branch information
maxulysse authored May 18, 2020
2 parents 5db3d24 + edd53b1 commit fdbdaac
Show file tree
Hide file tree
Showing 8 changed files with 690 additions and 526 deletions.
20 changes: 14 additions & 6 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,42 @@ jobs:
environment:
GENOME: GRCh37
SNPEFF_CACHE_VERSION: "75"
SAREK_TAG: dev
steps:
- checkout
- setup_remote_docker
- run:
command: docker build -t nfcore/sareksnpeff:dev.${GENOME} containers/snpeff/. --build-arg GENOME=${GENOME} --build-arg SNPEFF_CACHE_VERSION=${SNPEFF_CACHE_VERSION}
command: docker build -t nfcore/sareksnpeff:${SAREK_TAG}.${GENOME} containers/snpeff/. --build-arg GENOME=${GENOME} --build-arg SNPEFF_CACHE_VERSION=${SNPEFF_CACHE_VERSION}
- run:
command: |
echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
docker push nfcore/sareksnpeff:dev.${GENOME}
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sareksnpeff:${SAREK_TAG}.${GENOME}

snpeffgrch38:
<< : *buildsnpeff
environment:
GENOME: GRCh38
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffgrcm38:
<< : *buildsnpeff
environment:
GENOME: GRCm38
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffcanfam3_1:
<< : *buildsnpeff
environment:
GENOME: CanFam3.1
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffwbcel235:
<< : *buildsnpeff
environment:
GENOME: WBcel235
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

vepgrch37: &buildvep
docker:
Expand All @@ -47,42 +50,47 @@ jobs:
GENOME: GRCh37
SPECIES: homo_sapiens
VEP_VERSION: "99"
SAREK_TAG: dev
steps:
- checkout
- setup_remote_docker
- run:
command: docker build -t nfcore/sarekvep:dev.${GENOME} containers/vep/. --build-arg GENOME=${GENOME} --build-arg SPECIES=${SPECIES} --build-arg VEP_VERSION=${VEP_VERSION}
command: docker build -t nfcore/sarekvep:${SAREK_TAG}.${GENOME} containers/vep/. --build-arg GENOME=${GENOME} --build-arg SPECIES=${SPECIES} --build-arg VEP_VERSION=${VEP_VERSION}
no_output_timeout: 3h
- run:
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sarekvep:dev.${GENOME}
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sarekvep:${SAREK_TAG}.${GENOME}

vepgrch38:
<< : *buildvep
environment:
GENOME: GRCh38
SPECIES: homo_sapiens
VEP_VERSION: "99"
SAREK_TAG: dev

vepgrcm38:
<< : *buildvep
environment:
GENOME: GRCm38
SPECIES: mus_musculus
VEP_VERSION: "99"
SAREK_TAG: dev

vepcanfam3_1:
<< : *buildvep
environment:
GENOME: CanFam3.1
SPECIES: canis_familiaris
VEP_VERSION: "99"
SAREK_TAG: dev

vepwbcel235:
<< : *buildvep
environment:
GENOME: WBcel235
SPECIES: caenorhabditis_elegans
VEP_VERSION: "99"
SAREK_TAG: dev

workflows:
version: 2
Expand Down
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Piellorieppe is one of the main massif in the Sarek National Park.
- [#195](https://github.com/nf-core/sarek/pull/195) - Now creating TSV for duplicates marked BAMs in minimal setting
- [#195](https://github.com/nf-core/sarek/pull/195), [#202](https://github.com/nf-core/sarek/pull/202) - Add `--save_bam_mapped` params to save mapped BAMs.
- [#197](https://github.com/nf-core/sarek/pull/197) - Add step `prepare_recalibration` to allow restart from DuplicatesMarked BAMs
- [#204](https://github.com/nf-core/sarek/pull/204) - Add step `Control-FREEC` to allow restart from pileup files

### Changed

Expand All @@ -56,11 +57,10 @@ Piellorieppe is one of the main massif in the Sarek National Park.
- [#141](https://github.com/nf-core/sarek/pull/141) - Update `VEP` databases to `99`
- [#143](https://github.com/nf-core/sarek/pull/143) - Revert `snpEff` cache version to `75` for `GRCh37`
- [#143](https://github.com/nf-core/sarek/pull/143) - Revert `snpEff` cache version to `86` for `GRCh38`
- [#152](https://github.com/nf-core/sarek/pull/152), [#158](https://github.com/nf-core/sarek/pull/158), [#164](https://github.com/nf-core/sarek/pull/164), [#174](https://github.com/nf-core/sarek/pull/174), [#194](https://github.com/nf-core/sarek/pull/194) - Update docs
- [#152](https://github.com/nf-core/sarek/pull/152), [#158](https://github.com/nf-core/sarek/pull/158), [#164](https://github.com/nf-core/sarek/pull/164), [#174](https://github.com/nf-core/sarek/pull/174), [#194](https://github.com/nf-core/sarek/pull/194), [#198](https://github.com/nf-core/sarek/pull/198), [#204](https://github.com/nf-core/sarek/pull/204) - Update docs
- [#164](https://github.com/nf-core/sarek/pull/164) - Update `gatk4-spark` from `4.1.4.1` to `4.1.6.0`
- [#180](https://github.com/nf-core/sarek/pull/180), [#195](https://github.com/nf-core/sarek/pull/195) - Improve minimal setting
- [#183](https://github.com/nf-core/sarek/pull/183) - Update `input.md` documentation
- [#198](https://github.com/nf-core/sarek/pull/198) - Update docs
- [#183](https://github.com/nf-core/sarek/pull/183), [#204](https://github.com/nf-core/sarek/pull/204) - Update `input.md` documentation

### Fixed

Expand Down
2 changes: 2 additions & 0 deletions conf/genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions_Sarek.list"
known_indels = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
known_indels_index = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
mappability = "${params.genomes_base}/out100m2_hg19.gem"
snpeff_db = 'GRCh37.75'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -45,6 +46,7 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions.hg38.bed"
known_indels = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
known_indels_index = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
mappability = "${params.genomes_base}/out100m2_hg38.gem"
snpeff_db = 'GRCh38.86'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand Down
3 changes: 3 additions & 0 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/intervals/wgs_calling_regions_Sarek.list"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
known_indels_index = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/Control-FREEC/out100m2_hg19.gem"
snpeff_db = 'GRCh37.75'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -45,6 +46,7 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
known_indels_index = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/Control-FREEC/out100m2_hg38.gem"
snpeff_db = 'GRCh38.86'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -61,6 +63,7 @@ params {
intervals = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/intervals/GRCm38_calling_list.bed"
known_indels = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz"
known_indels_index = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi"
mappability = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Control-FREEC/GRCm38_68_mm10.gem"
snpeff_db = 'GRCm38.86'
species = 'mus_musculus'
vep_cache_version = '99'
Expand Down
22 changes: 21 additions & 1 deletion docs/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ For all possible TSV files, described in the next sections, here is an explanati
- `bam` is the path to the bam file
- `bai` is the path to the bam index file
- `recaltable` is the path to the recalibration table
- `mpileup` is the path to the mpileup file

It is recommended to add the absolute path of the files, but relative path should work also.
It is recommended to add the absolute path of the files, but relative path should also work.
Note, the delimiter is the tab (`\t`) character.

All examples are given for a normal/tumor pair.
Expand Down Expand Up @@ -172,6 +173,25 @@ When starting Sarek from the mapping or recalibrate steps, a TSV file is generat

Additionally, individual TSV files for each sample (`recalibrated_[SAMPLE].tsv`) can be found in the same directory.

## Starting from the mpileup file with the Control-FREEC step

To start from the Control-FREEC step (`--step Control-FREEC`), a TSV file for a normal/tumor pair needs to be given as input containing the paths to the mpileup files.
The TSV needs to contain the following columns:

- `subject sex status sample mpileup`

The same way, if you have non recalibrated BAMs and their indexes, you should use a structure like:

```text
G15511 XX 0 C09DFN pathToFiles/G15511.C09DFN.pileup
G15511 XX 1 D0ENMT pathToFiles/G15511.D0ENMT.pileup
```

When starting Sarek from the Control-FREEC step, a TSV file is generated automatically after the `mpileup` process.
This TSV file is stored under `results/VariantCalling/TSV/control-freec_mpileup.tsv` and can be used to restart Sarek from the mpileup files. Setting the step `--step Control-FREEC` will automatically take this file as input.

Additionally, individual TSV files for each sample (`control-freec_mpileup_[SAMPLE].tsv`) can be found in the same directory.

## VCF files for annotation

Input files for Sarek can be specified using the path to a VCF directory given to the `--input` command only with the annotation step (`--step annotate`).
Expand Down
Loading

0 comments on commit fdbdaac

Please sign in to comment.