Skip to content

Commit

Permalink
Fix #127 - update examples (#144)
Browse files Browse the repository at this point in the history
* Fix #127 - update examples

* one more heading

* Update changelog

* Update CHANGELOG.md

Co-authored-by: Felix Lenner <[email protected]>

* Fix #146 - whole_gene was removed with #77

* ok, ok, that would/should be a major. lets do that in a separate PR

---------

Co-authored-by: Felix Lenner <[email protected]>
  • Loading branch information
dnil and fellen31 authored Nov 4, 2024
1 parent a23e8d7 commit e45ff43
Show file tree
Hide file tree
Showing 7 changed files with 27 additions and 33 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Try to use the following format:
### Fixed
- The optional fields Source and Version are allowed in the VCF header([#106](https://github.com/Clinical-Genomics/genmod/pull/106))
- Released the constraint on Python 3.8 (collections, pkg_resources to importlib, tests) ([#142](https://github.com/Clinical-Genomics/genmod/pull/142))
- Update annotation examples ([#144](https://github.com/Clinical-Genomics/genmod/pull/144))


## [3.9]
- Fixed wrong models when chromosome X was named `chrX` and not `X` ([#135](https://github.com/Clinical-Genomics/genmod/pull/135))
Expand Down
5 changes: 0 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,6 @@ Also a line for logging is added in the vcf header with the id **genmod**, here

- Compound heterozygote inheritance pattern will be checked if two variants are exonic (or in canonical splice sites) and if they reside in the same gene.

- If compounds should be checked in the whole gene (including introns) use ```--whole_gene```
- GENMOD supports phased data, use the ```-phased``` flag. Data should follow the [GATK way](http://gatkforums.broadinstitute.org/discussion/45/read-backed-phasing) of phasing.

All annotations will be present only if they have a value.
Expand All @@ -211,7 +210,6 @@ All annotations will be present only if they have a value.
- By default the relative cadd scores is annotated with 'CADD=score', there is also an alternative to annotate with the raw cadd scores using the `--cadd_raw` flag. In this case a info field 'CADD_raw=score'.
- If your VCF is already annotated with VEP, use `-vep/--vep`
- If data is phased use `-phased/--phased`
- If you want to allow compound pairs in intronic regions to use `-gene/--whole_gene`
- If you want canonical splice site region to be bigger than 2 base pairs on each side of the exons, use `-splice/--splice_padding <integer>`
- The `-strict/--strict` flag tells **genmod** to only annotate genetic models if they are proved by the data. If a variant is not called in a family member it will not be annotated.

Expand Down Expand Up @@ -263,9 +261,6 @@ For this model individuals can be carriers so healthy individuals can be heteroz
### Autosomal Compound Heterozygote

This model includes pairs of exonic variants that are present within the same gene.
**The default behaviour of GENMOD is to look for compounds only in exonic/canonical splice sites**.
The reason for this is that since some genes have huge intronic regions the data will be drowned in compound pairs.
If the user wants all variants in genes checked use the flag -gene/--whole_gene.

1. Non-phased data:
* Affected individuals have to be het. for both variants
Expand Down
6 changes: 0 additions & 6 deletions docs/commands/annotate-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@ Options:
-p, --processes INTEGER Define how many processes that should be use
for annotation.
--silent Do not print the variants.
-w, --whole_gene If compounds should be checked over the
whole gene.
-k, --keyword TEXT What annotation keyword that should be used
when
searching for features.
Expand Down Expand Up @@ -103,10 +101,6 @@ If there are several processes running at the same time the variants will be pri
If no variants or headers should be printed to screen. This is mainly for testing.
### whole_gene ###
If variants in introns should be considered when checking compound pairs. This will lead to a huge number of compound candidates/pairs when dealing with whole genome data.
### keyword ###
This is the keyword in the vcf info field that genmod will look for when trying to find what genes the variant is annotated with.
Expand Down
5 changes: 1 addition & 4 deletions docs/genetic-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,7 @@ with reduced penetrance we will allow healthy carriers. (i.e. healthy individual

### Autosomal Compound Heterozygote ###

This model includes pairs of exonic variants that are present within the same gene.
**The default behaviour of GENMOD is to look for compounds only in exonic/canonical splice sites**.
The reason for this is that since some genes have huge intronic regions the data will be drowned in compound pairs.
If the user wants all variants in genes checked use the flag -gene/--whole_gene.
This model includes pairs of variants that are present within the same gene.

1. Non-phased data:
* Affected individuals have to be het. for both variants
Expand Down
30 changes: 17 additions & 13 deletions examples/readme.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Examples for GENMOD #
# Examples for GENMOD

These are some example files for getting to know **genmod** and the possible alternatives.
There are three test families
Expand All @@ -10,35 +10,39 @@ There are three test families
There are annotation files included in the distribution of **genmod** that are made from the [latest](ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz) refGene dataset. These will be used as default if no other annotation is given.
If the user want to build own annotations please use **genmod build_annotation**.

##Annotate variants for Recessive Family##
## Annotate variants for Recessive Family

```
genmod annotate examples/test_vcf.vcf -f examples/recessive_trio.ped -o examples/test_vcf_recessive_annotated.vcf
genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf
genmod models -f examples/recessive_trio.ped -o examples/test_vcf_recessive_models_annotated.vcf examples/test_vcf_annotated.vcf
```
The vcf file have a couple of variants made up so it will be easy to understand how the genetic inheritance patterns are annotated.

With the basic command listed above the output should look like the variants in ```examples/test_vcf_recessive_annotated.vcf```

1. First variant is a classic Autosomal Recessive case, each parent are carriers and the affected child is homozygous alternative.
2. This variant is annotated as ``AR_hom:AR_hom_dn`` since we miss information from one parent.
3. Here we have a de novo case, notice that we will only annotate this with ``AR_hom_dn`` for more effective sorting in a downstream analysis.
2. This variant is annotated as `AR_hom:AR_hom_dn` since we miss information from one parent.
3. Here we have a de novo case, notice that we will only annotate this with `AR_hom_dn` for more effective sorting in a downstream analysis.
4. Next two variants form a compound pair, that is each parent carry one variant each while the affected child have both. This implies that both variants are exonic(or in canonical splice region) and belong to the same gene.
5. At position 947378 we see an example of a Autosoma Dominant de novo variant.
6. The two following variants could form a compound pair since they are heterozygous in the affected child but they are not exonic. If the flag ``-g/--whole_gene`` is used these two will be annotated as compounds.
5. At position 947378 we see an example of a Autosomal Dominant *de novo* variant.
6. The two following variants could form a compound pair since they are heterozygous in the affected child but they are not exonic.

The following variants are to show how the ``-strict`` flag affects the analysis. When in strict mode we will only annotate a variant to follow a pattern if there is *proof* in the data. So if there are no calls in some individuals it will not follow any pattern.
The following variants are to show how the `--strict` flag affects the analysis. When in strict mode we will only annotate a variant to follow a pattern if there is *proof* in the data. So if there are no calls in some individuals it will not follow any pattern.


##Annotate variants for Dominant Family##
## Annotate variants for Dominant Family
```
genmod annotate test_data/test_vcf.vcf -f test_data/dominant_trio.ped -o examples/test_vcf_dominant_annotated.vcf
genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf
genmod models -f examples/dominant_trio.ped -o examples/test_vcf_dominant_annotated.vcf examples/test_vcf_annotated.vcf
```
We can now see how the conditions change when one of the parents are affected. For example the recessive pattern for the first variant is not followed since all affected needs to be homozygote alternative if the variant should follow the Autosomal Recessive pattern.


##Annotate variants for Multiple Families##
## Annotate variants for Multiple Families
```
genmod annotate test_data/test_vcf.vcf -f test_data/multi_family.ped -o examples/test_vcf_multi_annotated.vcf
genmod annotate -o examples/test_vcf_annotated.vcf --annotate_regions examples/test_vcf.vcf
genmod models -f examples/multi_family.ped -o examples/test_vcf_multi_annotated.vcf examples/test_vcf_annotated.vcf
```
We can now see how the conditions change when one of the parents are affected. For example the recessive pattern for the first variant is not followed since all affected needs to be homozygote alternative if the variant should follow the Autosomal Recessive pattern.

Expand All @@ -47,7 +51,7 @@ We can now see how the conditions change when one of the parents are affected. F

This is another example of how one can annotate with genmod:
```
genmod annotate examples/test_vcf.vcf --cadd_file examples/small_CADD.tsv.gz --thousand_g examples/small_1000G.vcf.gz
genmod annotate --cadd_file examples/small_CADD.tsv.gz --thousand_g examples/small_1000G.vcf.gz examples/test_vcf.vcf
```

Please post issues on http://github.com/Clinical-Genomics/genmod in case of any problems.
4 changes: 2 additions & 2 deletions examples/test_vcf.vcf
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
##contig=<ID=1>
##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother proband father_2 mother_2 proband_2
1 879537 . T C 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60
1 879541 . G A 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 1/1:10,10:60 ./. 0/1:10,10:60 0/1:10,10:60
1 879537 . T C 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 ./. 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60
1 879541 . G A 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
1 879595 . C T 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 1/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
1 879676 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 1/1:10,10:60 1/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
1 879911 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
Expand Down
8 changes: 5 additions & 3 deletions genmod/commands/annotate_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@
is_flag=True,
help='If strict model annotations should be used(see documentation).'
)
@click.option('-w' ,'--whole_gene','--whole-gene',
@click.option('-w' ,'--whole_gene','--whole-gene',
is_flag=True,
help='If compounds should be checked for all variants in a gene. DEPRECATED'
help='DEPRECATED FLAG - on by default'
)
@processes
@silent
Expand All @@ -86,8 +86,10 @@ def models(context, variant_file, family_file, family_type, reduced_penetrance,
Checks what patterns of inheritance that are followed in a VCF file.
The analysis is family based so each family that are specified in the family
file and exists in the variant file will get it's own annotation.
Note that the "whole_gene" flag has been disabled and will be removed in a later version.
"""

######### This is for logging the command line string #########
frame = inspect.currentframe()
args, _, _, values = inspect.getargvalues(frame)
Expand Down

0 comments on commit e45ff43

Please sign in to comment.