From e45ff43c395c332085121f04747dbe7e8d73caa7 Mon Sep 17 00:00:00 2001 From: Daniel Nilsson Date: Mon, 4 Nov 2024 08:11:55 +0100 Subject: [PATCH] Fix #127 - update examples (#144) * Fix #127 - update examples * one more heading * Update changelog * Update CHANGELOG.md Co-authored-by: Felix Lenner <52530259+fellen31@users.noreply.github.com> * Fix #146 - whole_gene was removed with #77 * ok, ok, that would/should be a major. lets do that in a separate PR --------- Co-authored-by: Felix Lenner <52530259+fellen31@users.noreply.github.com> --- CHANGELOG.md | 2 ++ README.md | 5 ----- docs/commands/annotate-models.md | 6 ------ docs/genetic-models.md | 5 +---- examples/readme.md | 30 +++++++++++++++++------------- examples/test_vcf.vcf | 4 ++-- genmod/commands/annotate_models.py | 8 +++++--- 7 files changed, 27 insertions(+), 33 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 26bf897..5c845c5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,8 @@ Try to use the following format: ### Fixed - The optional fields Source and Version are allowed in the VCF header([#106](https://github.com/Clinical-Genomics/genmod/pull/106)) - Released the constraint on Python 3.8 (collections, pkg_resources to importlib, tests) ([#142](https://github.com/Clinical-Genomics/genmod/pull/142)) +- Update annotation examples ([#144](https://github.com/Clinical-Genomics/genmod/pull/144)) + ## [3.9] - Fixed wrong models when chromosome X was named `chrX` and not `X` ([#135](https://github.com/Clinical-Genomics/genmod/pull/135)) diff --git a/README.md b/README.md index 988758d..cbc0bd3 100644 --- a/README.md +++ b/README.md @@ -195,7 +195,6 @@ Also a line for logging is added in the vcf header with the id **genmod**, here - Compound heterozygote inheritance pattern will be checked if two variants are exonic (or in canonical splice sites) and if they reside in the same gene. -- If compounds should be checked in the whole gene (including introns) use ```--whole_gene``` - GENMOD supports phased data, use the ```-phased``` flag. Data should follow the [GATK way](http://gatkforums.broadinstitute.org/discussion/45/read-backed-phasing) of phasing. All annotations will be present only if they have a value. @@ -211,7 +210,6 @@ All annotations will be present only if they have a value. - By default the relative cadd scores is annotated with 'CADD=score', there is also an alternative to annotate with the raw cadd scores using the `--cadd_raw` flag. In this case a info field 'CADD_raw=score'. - If your VCF is already annotated with VEP, use `-vep/--vep` - If data is phased use `-phased/--phased` -- If you want to allow compound pairs in intronic regions to use `-gene/--whole_gene` - If you want canonical splice site region to be bigger than 2 base pairs on each side of the exons, use `-splice/--splice_padding ` - The `-strict/--strict` flag tells **genmod** to only annotate genetic models if they are proved by the data. If a variant is not called in a family member it will not be annotated. @@ -263,9 +261,6 @@ For this model individuals can be carriers so healthy individuals can be heteroz ### Autosomal Compound Heterozygote This model includes pairs of exonic variants that are present within the same gene. -**The default behaviour of GENMOD is to look for compounds only in exonic/canonical splice sites**. -The reason for this is that since some genes have huge intronic regions the data will be drowned in compound pairs. -If the user wants all variants in genes checked use the flag -gene/--whole_gene. 1. Non-phased data: * Affected individuals have to be het. for both variants diff --git a/docs/commands/annotate-models.md b/docs/commands/annotate-models.md index 33b6040..13b6030 100644 --- a/docs/commands/annotate-models.md +++ b/docs/commands/annotate-models.md @@ -31,8 +31,6 @@ Options: -p, --processes INTEGER Define how many processes that should be use for annotation. --silent Do not print the variants. - -w, --whole_gene If compounds should be checked over the - whole gene. -k, --keyword TEXT What annotation keyword that should be used when searching for features. @@ -103,10 +101,6 @@ If there are several processes running at the same time the variants will be pri If no variants or headers should be printed to screen. This is mainly for testing. -### whole_gene ### - -If variants in introns should be considered when checking compound pairs. This will lead to a huge number of compound candidates/pairs when dealing with whole genome data. - ### keyword ### This is the keyword in the vcf info field that genmod will look for when trying to find what genes the variant is annotated with. diff --git a/docs/genetic-models.md b/docs/genetic-models.md index 208c6f9..732bcb2 100644 --- a/docs/genetic-models.md +++ b/docs/genetic-models.md @@ -33,10 +33,7 @@ with reduced penetrance we will allow healthy carriers. (i.e. healthy individual ### Autosomal Compound Heterozygote ### -This model includes pairs of exonic variants that are present within the same gene. -**The default behaviour of GENMOD is to look for compounds only in exonic/canonical splice sites**. -The reason for this is that since some genes have huge intronic regions the data will be drowned in compound pairs. -If the user wants all variants in genes checked use the flag -gene/--whole_gene. +This model includes pairs of variants that are present within the same gene. 1. Non-phased data: * Affected individuals have to be het. for both variants diff --git a/examples/readme.md b/examples/readme.md index 373fb71..a6ea66c 100644 --- a/examples/readme.md +++ b/examples/readme.md @@ -1,4 +1,4 @@ -# Examples for GENMOD # +# Examples for GENMOD These are some example files for getting to know **genmod** and the possible alternatives. There are three test families @@ -10,35 +10,39 @@ There are three test families There are annotation files included in the distribution of **genmod** that are made from the [latest](ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz) refGene dataset. These will be used as default if no other annotation is given. If the user want to build own annotations please use **genmod build_annotation**. -##Annotate variants for Recessive Family## +## Annotate variants for Recessive Family ``` - genmod annotate examples/test_vcf.vcf -f examples/recessive_trio.ped -o examples/test_vcf_recessive_annotated.vcf + genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf + genmod models -f examples/recessive_trio.ped -o examples/test_vcf_recessive_models_annotated.vcf examples/test_vcf_annotated.vcf ``` The vcf file have a couple of variants made up so it will be easy to understand how the genetic inheritance patterns are annotated. With the basic command listed above the output should look like the variants in ```examples/test_vcf_recessive_annotated.vcf``` 1. First variant is a classic Autosomal Recessive case, each parent are carriers and the affected child is homozygous alternative. -2. This variant is annotated as ``AR_hom:AR_hom_dn`` since we miss information from one parent. -3. Here we have a de novo case, notice that we will only annotate this with ``AR_hom_dn`` for more effective sorting in a downstream analysis. +2. This variant is annotated as `AR_hom:AR_hom_dn` since we miss information from one parent. +3. Here we have a de novo case, notice that we will only annotate this with `AR_hom_dn` for more effective sorting in a downstream analysis. 4. Next two variants form a compound pair, that is each parent carry one variant each while the affected child have both. This implies that both variants are exonic(or in canonical splice region) and belong to the same gene. -5. At position 947378 we see an example of a Autosoma Dominant de novo variant. -6. The two following variants could form a compound pair since they are heterozygous in the affected child but they are not exonic. If the flag ``-g/--whole_gene`` is used these two will be annotated as compounds. +5. At position 947378 we see an example of a Autosomal Dominant *de novo* variant. +6. The two following variants could form a compound pair since they are heterozygous in the affected child but they are not exonic. -The following variants are to show how the ``-strict`` flag affects the analysis. When in strict mode we will only annotate a variant to follow a pattern if there is *proof* in the data. So if there are no calls in some individuals it will not follow any pattern. +The following variants are to show how the `--strict` flag affects the analysis. When in strict mode we will only annotate a variant to follow a pattern if there is *proof* in the data. So if there are no calls in some individuals it will not follow any pattern. -##Annotate variants for Dominant Family## +## Annotate variants for Dominant Family ``` - genmod annotate test_data/test_vcf.vcf -f test_data/dominant_trio.ped -o examples/test_vcf_dominant_annotated.vcf + genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf + genmod models -f examples/dominant_trio.ped -o examples/test_vcf_dominant_annotated.vcf examples/test_vcf_annotated.vcf + ``` We can now see how the conditions change when one of the parents are affected. For example the recessive pattern for the first variant is not followed since all affected needs to be homozygote alternative if the variant should follow the Autosomal Recessive pattern. -##Annotate variants for Multiple Families## +## Annotate variants for Multiple Families ``` - genmod annotate test_data/test_vcf.vcf -f test_data/multi_family.ped -o examples/test_vcf_multi_annotated.vcf + genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf + genmod models -f examples/multi_family.ped -o examples/test_vcf_multi_annotated.vcf examples/test_vcf_annotated.vcf ``` We can now see how the conditions change when one of the parents are affected. For example the recessive pattern for the first variant is not followed since all affected needs to be homozygote alternative if the variant should follow the Autosomal Recessive pattern. @@ -47,7 +51,7 @@ We can now see how the conditions change when one of the parents are affected. F This is another example of how one can annotate with genmod: ``` - genmod annotate examples/test_vcf.vcf --cadd_file examples/small_CADD.tsv.gz --thousand_g examples/small_1000G.vcf.gz + genmod annotate --cadd_file examples/small_CADD.tsv.gz --thousand_g examples/small_1000G.vcf.gz examples/test_vcf.vcf ``` Please post issues on http://github.com/Clinical-Genomics/genmod in case of any problems. \ No newline at end of file diff --git a/examples/test_vcf.vcf b/examples/test_vcf.vcf index d24f12c..fc9fbd4 100644 --- a/examples/test_vcf.vcf +++ b/examples/test_vcf.vcf @@ -3,8 +3,8 @@ ##contig= ##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother proband father_2 mother_2 proband_2 -1 879537 . T C 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 -1 879541 . G A 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 1/1:10,10:60 ./. 0/1:10,10:60 0/1:10,10:60 +1 879537 . T C 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 ./. 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 +1 879541 . G A 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 1 879595 . C T 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 1/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1 879676 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 1/1:10,10:60 1/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 1 879911 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 diff --git a/genmod/commands/annotate_models.py b/genmod/commands/annotate_models.py index 4850b51..18746e3 100755 --- a/genmod/commands/annotate_models.py +++ b/genmod/commands/annotate_models.py @@ -63,9 +63,9 @@ is_flag=True, help='If strict model annotations should be used(see documentation).' ) -@click.option('-w' ,'--whole_gene','--whole-gene', +@click.option('-w' ,'--whole_gene','--whole-gene', is_flag=True, - help='If compounds should be checked for all variants in a gene. DEPRECATED' + help='DEPRECATED FLAG - on by default' ) @processes @silent @@ -86,8 +86,10 @@ def models(context, variant_file, family_file, family_type, reduced_penetrance, Checks what patterns of inheritance that are followed in a VCF file. The analysis is family based so each family that are specified in the family file and exists in the variant file will get it's own annotation. + + Note that the "whole_gene" flag has been disabled and will be removed in a later version. """ - + ######### This is for logging the command line string ######### frame = inspect.currentframe() args, _, _, values = inspect.getargvalues(frame)