-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update consequence severity filtering #326
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a few comments out of curiosity.
If I understood correctly, 87 new evidence strings (of overlapping genes) would be reported as of today with this modification, correct?
VEP_SHORT_QUERY_DISTANCE = 5000 | ||
VEP_LONG_QUERY_DISTANCE = 500000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason behind no longer querying VEP twice if the short distance doesn't find any gene? Just curious
@@ -97,54 +88,58 @@ def load_consequence_severity_rank(): | |||
return {term: index for index, term in enumerate(get_severity_ranking())} | |||
|
|||
|
|||
def extract_consequences(vep_results, acceptable_biotypes, only_closest, results_by_variant, report_distance=False): | |||
def extract_consequences(vep_results, acceptable_biotypes): | |||
"""Given VEP results, return a list of consequences matching certain criteria. | |||
|
|||
Args: | |||
vep_results: results obtained from VEP for a list of variants, in JSON format. | |||
acceptable_biotypes: a list of transcript biotypes to consider (as defined in Ensembl documentation, see | |||
https://www.ensembl.org/info/genome/genebuild/biotypes.html). Consequences for other transcript biotypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, among all biotypes, we only consider miRNAs and protein coding genes? Sorry again, has not much to do with this PR but I'm curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question and it seems to have been decided long ago (before the birth of this repo).
There are a few others that we could consider in the non-coding genes, the IG genes and the TR genes.
Probably. a topic to raise with OpenTarget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #328 for this, we can discuss at our next meeting.
Actually that's just 87 new evidence strings for the set of 2000 variants used in the test for the original VEP pipeline (based on coordinates, mostly SNPs / short indels). You can see this test for a much smaller set used in testing the current structural pipeline. For your other questions, they're from before my time as well (and actually before the time of this repo, so not even in the history!) so we'd have to ask @tskir or @tcezard... I'm also curious about this. |
@@ -97,54 +88,58 @@ def load_consequence_severity_rank(): | |||
return {term: index for index, term in enumerate(get_severity_ranking())} | |||
|
|||
|
|||
def extract_consequences(vep_results, acceptable_biotypes, only_closest, results_by_variant, report_distance=False): | |||
def extract_consequences(vep_results, acceptable_biotypes): | |||
"""Given VEP results, return a list of consequences matching certain criteria. | |||
|
|||
Args: | |||
vep_results: results obtained from VEP for a list of variants, in JSON format. | |||
acceptable_biotypes: a list of transcript biotypes to consider (as defined in Ensembl documentation, see | |||
https://www.ensembl.org/info/genome/genebuild/biotypes.html). Consequences for other transcript biotypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question and it seems to have been decided long ago (before the birth of this repo).
There are a few others that we could consider in the non-coding genes, the IG genes and the TR genes.
Probably. a topic to raise with OpenTarget
Closes #321
To get a sense of the additional consequences the new strategy gives us, you can look at this commit which shows the diff in the VEP test file for SNPs, as well as some of the other tests.