Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control-FREEC improvements #204

Merged
merged 34 commits into from
May 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cb13187
added script to help downstream analysis, adding RankScore for Scout
Oct 17, 2019
8551b2f
nf-core bump-version . 2.5.1
maxulysse Oct 21, 2019
fc1dd2f
reducing CPU usage for Mpileup
Dec 18, 2019
3b11389
First time CF is working with extra pileups
Apr 22, 2020
203b756
yet an other version that somehow worked
Apr 27, 2020
fd80e8c
remove --normal_pileup and --tumor_pileup params
maxulysse May 8, 2020
fad7a6a
update freebayes germline command
ggabernet May 7, 2020
3929a67
update changelog
ggabernet May 7, 2020
5a62d50
fix --save_bam_mapped + small typos
maxulysse May 8, 2020
455fd06
update CHANGELOG
maxulysse May 8, 2020
815c36e
add step controlfreec
maxulysse May 8, 2020
fa09b3d
use nf-core header
maxulysse May 8, 2020
b46f4e2
update config
maxulysse May 8, 2020
5f6eabc
cleanup
maxulysse May 8, 2020
04cb2cd
add mappability files to igenomes
maxulysse May 11, 2020
e5a27b4
better name for tsv file
maxulysse May 11, 2020
c514c3f
fix output directory
maxulysse May 11, 2020
50a584c
enable usage of - in names of tools or steps
maxulysse May 11, 2020
38dd289
better default for coefficientofvariation
maxulysse May 11, 2020
c7aec1a
improve help
maxulysse May 11, 2020
5d9e995
First working version
Apr 24, 2020
1983576
Fixing some null parameters and C&P typos
Apr 24, 2020
1ca1c5a
update and reorder params
maxulysse May 12, 2020
57cfad7
reorder and update params
maxulysse May 12, 2020
1692e5a
update and reorder params
maxulysse May 12, 2020
cda438b
update docs
maxulysse May 12, 2020
4a1cf40
fix prepare_recalibration step
maxulysse May 12, 2020
5b5d1dc
update CHANGELOG
maxulysse May 12, 2020
4e0a5f5
Apply suggestions from code review
maxulysse May 12, 2020
f59bcd0
use only one cpu to generate mpileup for controlfreec
maxulysse May 14, 2020
23c2086
fix conflicts
maxulysse May 15, 2020
8226fd6
spacing
maxulysse May 15, 2020
9a3ef3d
fix merge conflicts
maxulysse May 15, 2020
edd53b1
if step = controlfreec then tools is controlfreec
maxulysse May 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,42 @@ jobs:
environment:
GENOME: GRCh37
SNPEFF_CACHE_VERSION: "75"
SAREK_TAG: dev
steps:
- checkout
- setup_remote_docker
- run:
command: docker build -t nfcore/sareksnpeff:dev.${GENOME} containers/snpeff/. --build-arg GENOME=${GENOME} --build-arg SNPEFF_CACHE_VERSION=${SNPEFF_CACHE_VERSION}
command: docker build -t nfcore/sareksnpeff:${SAREK_TAG}.${GENOME} containers/snpeff/. --build-arg GENOME=${GENOME} --build-arg SNPEFF_CACHE_VERSION=${SNPEFF_CACHE_VERSION}
- run:
command: |
echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
docker push nfcore/sareksnpeff:dev.${GENOME}
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sareksnpeff:${SAREK_TAG}.${GENOME}

snpeffgrch38:
<< : *buildsnpeff
environment:
GENOME: GRCh38
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffgrcm38:
<< : *buildsnpeff
environment:
GENOME: GRCm38
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffcanfam3_1:
<< : *buildsnpeff
environment:
GENOME: CanFam3.1
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

snpeffwbcel235:
<< : *buildsnpeff
environment:
GENOME: WBcel235
SNPEFF_CACHE_VERSION: "86"
SAREK_TAG: dev

vepgrch37: &buildvep
docker:
Expand All @@ -47,42 +50,47 @@ jobs:
GENOME: GRCh37
SPECIES: homo_sapiens
VEP_VERSION: "99"
SAREK_TAG: dev
steps:
- checkout
- setup_remote_docker
- run:
command: docker build -t nfcore/sarekvep:dev.${GENOME} containers/vep/. --build-arg GENOME=${GENOME} --build-arg SPECIES=${SPECIES} --build-arg VEP_VERSION=${VEP_VERSION}
command: docker build -t nfcore/sarekvep:${SAREK_TAG}.${GENOME} containers/vep/. --build-arg GENOME=${GENOME} --build-arg SPECIES=${SPECIES} --build-arg VEP_VERSION=${VEP_VERSION}
no_output_timeout: 3h
- run:
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sarekvep:dev.${GENOME}
command: echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin ; docker push nfcore/sarekvep:${SAREK_TAG}.${GENOME}

vepgrch38:
<< : *buildvep
environment:
GENOME: GRCh38
SPECIES: homo_sapiens
VEP_VERSION: "99"
SAREK_TAG: dev

vepgrcm38:
<< : *buildvep
environment:
GENOME: GRCm38
SPECIES: mus_musculus
VEP_VERSION: "99"
SAREK_TAG: dev

vepcanfam3_1:
<< : *buildvep
environment:
GENOME: CanFam3.1
SPECIES: canis_familiaris
VEP_VERSION: "99"
SAREK_TAG: dev

vepwbcel235:
<< : *buildvep
environment:
GENOME: WBcel235
SPECIES: caenorhabditis_elegans
VEP_VERSION: "99"
SAREK_TAG: dev

workflows:
version: 2
Expand Down
10 changes: 5 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ Piellorieppe is one of the main massif in the Sarek National Park.
- [#174](https://github.com/nf-core/sarek/pull/174) - Add `variant_calling.md` documentation
- [#175](https://github.com/nf-core/sarek/pull/175) - Add `Sentieon` documentation
- [#176](https://github.com/nf-core/sarek/pull/176) - Add empty `custom` genome in `genomes.config` to allow genomes that are not in `AWS iGenomes`
- [#179](https://github.com/nf-core/sarek/pull/179) - Add `FreeBayes` germline variant calling
- [#179](https://github.com/nf-core/sarek/pull/179), [#201](https://github.com/nf-core/sarek/pull/201) - Add `FreeBayes` germline variant calling
- [#180](https://github.com/nf-core/sarek/pull/180) - Now saving Mapped BAMs (and creating TSV) in minimal setting
- [#182](https://github.com/nf-core/sarek/pull/182) - Add possibility to run `HaplotypeCaller` without `dbsnp` so it can be used to actually generate vcfs to build a set of known sites (cf [gatkforums](https://gatkforums.broadinstitute.org/gatk/discussion/1247/what-should-i-use-as-known-variants-sites-for-running-tool-x))
- [#195](https://github.com/nf-core/sarek/pull/195) - Now creating TSV for duplicates marked BAMs in minimal setting
- [#195](https://github.com/nf-core/sarek/pull/195) - Add `--save_bam_mapped` params to save mapped BAMs.
- [#195](https://github.com/nf-core/sarek/pull/195), [#202](https://github.com/nf-core/sarek/pull/202) - Add `--save_bam_mapped` params to save mapped BAMs.
- [#197](https://github.com/nf-core/sarek/pull/197) - Add step `prepare_recalibration` to allow restart from DuplicatesMarked BAMs
- [#204](https://github.com/nf-core/sarek/pull/204) - Add step `Control-FREEC` to allow restart from pileup files

### Changed

Expand All @@ -56,11 +57,10 @@ Piellorieppe is one of the main massif in the Sarek National Park.
- [#141](https://github.com/nf-core/sarek/pull/141) - Update `VEP` databases to `99`
- [#143](https://github.com/nf-core/sarek/pull/143) - Revert `snpEff` cache version to `75` for `GRCh37`
- [#143](https://github.com/nf-core/sarek/pull/143) - Revert `snpEff` cache version to `86` for `GRCh38`
- [#152](https://github.com/nf-core/sarek/pull/152), [#158](https://github.com/nf-core/sarek/pull/158), [#164](https://github.com/nf-core/sarek/pull/164), [#174](https://github.com/nf-core/sarek/pull/174), [#194](https://github.com/nf-core/sarek/pull/194) - Update docs
- [#152](https://github.com/nf-core/sarek/pull/152), [#158](https://github.com/nf-core/sarek/pull/158), [#164](https://github.com/nf-core/sarek/pull/164), [#174](https://github.com/nf-core/sarek/pull/174), [#194](https://github.com/nf-core/sarek/pull/194), [#198](https://github.com/nf-core/sarek/pull/198), [#204](https://github.com/nf-core/sarek/pull/204) - Update docs
- [#164](https://github.com/nf-core/sarek/pull/164) - Update `gatk4-spark` from `4.1.4.1` to `4.1.6.0`
- [#180](https://github.com/nf-core/sarek/pull/180), [#195](https://github.com/nf-core/sarek/pull/195) - Improve minimal setting
- [#183](https://github.com/nf-core/sarek/pull/183) - Update `input.md` documentation
- [#198](https://github.com/nf-core/sarek/pull/198) - Update docs
- [#183](https://github.com/nf-core/sarek/pull/183), [#204](https://github.com/nf-core/sarek/pull/204) - Update `input.md` documentation

### Fixed

Expand Down
2 changes: 2 additions & 0 deletions conf/genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions_Sarek.list"
known_indels = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
known_indels_index = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
mappability = "${params.genomes_base}/out100m2_hg19.gem"
snpeff_db = 'GRCh37.75'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -45,6 +46,7 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions.hg38.bed"
known_indels = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
known_indels_index = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
mappability = "${params.genomes_base}/out100m2_hg38.gem"
snpeff_db = 'GRCh38.86'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand Down
3 changes: 3 additions & 0 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/intervals/wgs_calling_regions_Sarek.list"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
known_indels_index = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/Control-FREEC/out100m2_hg19.gem"
snpeff_db = 'GRCh37.75'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -45,6 +46,7 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
known_indels_index = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/Control-FREEC/out100m2_hg38.gem"
snpeff_db = 'GRCh38.86'
species = 'homo_sapiens'
vep_cache_version = '99'
Expand All @@ -61,6 +63,7 @@ params {
intervals = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/intervals/GRCm38_calling_list.bed"
known_indels = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz"
known_indels_index = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi"
mappability = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Control-FREEC/GRCm38_68_mm10.gem"
snpeff_db = 'GRCm38.86'
species = 'mus_musculus'
vep_cache_version = '99'
Expand Down
22 changes: 21 additions & 1 deletion docs/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ For all possible TSV files, described in the next sections, here is an explanati
- `bam` is the path to the bam file
- `bai` is the path to the bam index file
- `recaltable` is the path to the recalibration table
- `mpileup` is the path to the mpileup file

It is recommended to add the absolute path of the files, but relative path should work also.
It is recommended to add the absolute path of the files, but relative path should also work.
Note, the delimiter is the tab (`\t`) character.

All examples are given for a normal/tumor pair.
Expand Down Expand Up @@ -172,6 +173,25 @@ When starting Sarek from the mapping or recalibrate steps, a TSV file is generat

Additionally, individual TSV files for each sample (`recalibrated_[SAMPLE].tsv`) can be found in the same directory.

## Starting from the mpileup file with the Control-FREEC step

To start from the Control-FREEC step (`--step Control-FREEC`), a TSV file for a normal/tumor pair needs to be given as input containing the paths to the mpileup files.
The TSV needs to contain the following columns:

- `subject sex status sample mpileup`

The same way, if you have non recalibrated BAMs and their indexes, you should use a structure like:

```text
G15511 XX 0 C09DFN pathToFiles/G15511.C09DFN.pileup
G15511 XX 1 D0ENMT pathToFiles/G15511.D0ENMT.pileup
```

When starting Sarek from the Control-FREEC step, a TSV file is generated automatically after the `mpileup` process.
This TSV file is stored under `results/VariantCalling/TSV/control-freec_mpileup.tsv` and can be used to restart Sarek from the mpileup files. Setting the step `--step Control-FREEC` will automatically take this file as input.

Additionally, individual TSV files for each sample (`control-freec_mpileup_[SAMPLE].tsv`) can be found in the same directory.

## VCF files for annotation

Input files for Sarek can be specified using the path to a VCF directory given to the `--input` command only with the annotation step (`--step annotate`).
Expand Down
Loading