Update consequence severity filtering #326

apriltuesday · 2022-05-09T08:55:15Z

Closes #321

Update strategy for filtering consequences based on severity
- Genes overlapping variant are all reported, using the most severe consequence per gene
- If no genes overlap, use overall most severe consequence and report all genes with that consequence (previous behaviour)
Refactor to remove unused code (distant VEP querying) and dataframe columns for consequence prediction

To get a sense of the additional consequences the new strategy gives us, you can look at this commit which shows the diff in the VEP test file for SNPs, as well as some of the other tests.

M-casado

LGTM. Just a few comments out of curiosity.

If I understood correctly, 87 new evidence strings (of overlapping genes) would be reported as of today with this modification, correct?

M-casado · 2022-05-10T17:15:13Z

consequence_prediction/vep_mapping_pipeline/consequence_mapping.py

 VEP_SHORT_QUERY_DISTANCE = 5000
-VEP_LONG_QUERY_DISTANCE = 500000


What was the reason behind no longer querying VEP twice if the short distance doesn't find any gene? Just curious

M-casado · 2022-05-10T17:25:00Z

consequence_prediction/vep_mapping_pipeline/consequence_mapping.py

@@ -97,54 +88,58 @@ def load_consequence_severity_rank():
    return {term: index for index, term in enumerate(get_severity_ranking())}


-def extract_consequences(vep_results, acceptable_biotypes, only_closest, results_by_variant, report_distance=False):
+def extract_consequences(vep_results, acceptable_biotypes):
    """Given VEP results, return a list of consequences matching certain criteria.

    Args:
        vep_results: results obtained from VEP for a list of variants, in JSON format.
        acceptable_biotypes: a list of transcript biotypes to consider (as defined in Ensembl documentation, see
            https://www.ensembl.org/info/genome/genebuild/biotypes.html). Consequences for other transcript biotypes


Why, among all biotypes, we only consider miRNAs and protein coding genes? Sorry again, has not much to do with this PR but I'm curious.

It's a good question and it seems to have been decided long ago (before the birth of this repo).
There are a few others that we could consider in the non-coding genes, the IG genes and the TR genes.
Probably. a topic to raise with OpenTarget

Created #328 for this, we can discuss at our next meeting.

apriltuesday · 2022-05-11T07:56:02Z

Actually that's just 87 new evidence strings for the set of 2000 variants used in the test for the original VEP pipeline (based on coordinates, mostly SNPs / short indels). You can see this test for a much smaller set used in testing the current structural pipeline.

For your other questions, they're from before my time as well (and actually before the time of this repo, so not even in the history!) so we'd have to ask @tskir or @tcezard... I'm also curious about this.

tcezard · 2022-05-12T08:25:06Z

consequence_prediction/vep_mapping_pipeline/consequence_mapping.py

@@ -97,54 +88,58 @@ def load_consequence_severity_rank():
    return {term: index for index, term in enumerate(get_severity_ranking())}


-def extract_consequences(vep_results, acceptable_biotypes, only_closest, results_by_variant, report_distance=False):
+def extract_consequences(vep_results, acceptable_biotypes):
    """Given VEP results, return a list of consequences matching certain criteria.

    Args:
        vep_results: results obtained from VEP for a list of variants, in JSON format.
        acceptable_biotypes: a list of transcript biotypes to consider (as defined in Ensembl documentation, see
            https://www.ensembl.org/info/genome/genebuild/biotypes.html). Consequences for other transcript biotypes


It's a good question and it seems to have been decided long ago (before the birth of this repo).
There are a few others that we could consider in the non-coding genes, the IG genes and the TR genes.
Probably. a topic to raise with OpenTarget

consequence_prediction/vep_mapping_pipeline/consequence_mapping.py

apriltuesday added 7 commits May 4, 2022 10:51

remove VEP long query distance option

7eb3e33

remove unnecessary columns from consequence dataframes

621c975

implement new consequence filtering approach

77daa04

update snp2gene file for tests

b9a873a

fix tests

a22aa54

fix output_mappings for VEP test

6ce5c46

add new consequences for VEP test

0763189

apriltuesday marked this pull request as ready for review May 9, 2022 11:06

apriltuesday self-assigned this May 9, 2022

apriltuesday requested review from M-casado and tcezard May 9, 2022 11:07

M-casado approved these changes May 10, 2022

View reviewed changes

tcezard reviewed May 12, 2022

View reviewed changes

extract consequence filtering strategies into separate methods

9475ae3

apriltuesday mentioned this pull request May 13, 2022

Revisit acceptable biotypes for consequences #328

Open

apriltuesday requested a review from tcezard May 13, 2022 13:21

tcezard approved these changes May 13, 2022

View reviewed changes

apriltuesday merged commit a11b9cf into EBIvariation:master May 13, 2022

apriltuesday deleted the issue-321 branch May 13, 2022 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update consequence severity filtering #326

Update consequence severity filtering #326

apriltuesday commented May 9, 2022 •

edited

Loading

M-casado left a comment

M-casado May 10, 2022

M-casado May 10, 2022

tcezard May 12, 2022

apriltuesday May 13, 2022

apriltuesday commented May 11, 2022

tcezard May 12, 2022

		VEP_SHORT_QUERY_DISTANCE = 5000
		VEP_LONG_QUERY_DISTANCE = 500000

Update consequence severity filtering #326

Update consequence severity filtering #326

Conversation

apriltuesday commented May 9, 2022 • edited Loading

M-casado left a comment

Choose a reason for hiding this comment

M-casado May 10, 2022

Choose a reason for hiding this comment

M-casado May 10, 2022

Choose a reason for hiding this comment

tcezard May 12, 2022

Choose a reason for hiding this comment

apriltuesday May 13, 2022

Choose a reason for hiding this comment

apriltuesday commented May 11, 2022

tcezard May 12, 2022

Choose a reason for hiding this comment

apriltuesday commented May 9, 2022 •

edited

Loading