Install snakemake
, ensembl-vep
first.
update the path to:
- BAM_DIR
- GENOME
- VEP
And the output directory:
-
BCFTOOL (bcftools pipeline)
- WORKSPACE
-
GATK (gatk pipeline)
- WORKSPACE
# modify the file path in haplomap and run with 12 cores
snakemake -s bcftools.call.smk --configfile config.yaml \
-k -p -j 12
or
# modify the file path in haplomap and run with 12 cores
snakemake -s gatk.call.smk --configfile config.yaml \
-k -p -j 12
- bcftools call
- prefered pipeline for inbred mouse and haplomap input
- GATK best practice
- designed for human genetics
- users are responsible for tuning
VQSR
orhardfiltering
parameters if use non-human data
Caution !: Both pipelines take a long time to run.
or use conda to install vep
Install data for offline mode
INSTALL.pl -a cfp -s mus_musculus -y GRCm38 --CONVERT --PLUGINS CADD,GO,TSSDistance,LoF,SpliceAI
or Human
INSTALL.pl -a cfp -s homo_sapiens -y GRCh38 --CONVERT
e.g
bcftools view -f .,PASS ${input.vcf} | \
${params.VEPBIN}/vep --fasta ${input.reference} ${params.genome_build} \
--format vcf --fork ${threads} --hgvs --force_overwrite \
--uniprot --domains --symbol --regulatory --distance 1000 --biotype \
--gene_phenotype MGI --check_existing --pubmed --numbers \
--offline --cache --variant_class \
--gencode_basic --no_intergenic --individual all \
-o ${output} --tab --compress_output gzip \
One of my colleague who studies mouse genetics, said,
I tried the haplotype caller from GATK. But it seems that the haplotype caller is designed for heterogeneous genome like human than for mice. Therefore, the result coming out of HC is worse than samtools, as I manually inspected a few regions that HC calls didn't make sense.
In addition, in one of their mouse genomic paper that we reviewed, they even skipped the second recalibration step. We asked them why and they said it was because of the same reason: good for human but not that good for the homogeneous inbred mouse.
However, my experiences with GATK (>v4.0) is as good as bcftools.