Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ratio of parental alleles to intermissions #195

Closed
ktmeaton opened this issue Nov 4, 2022 · 5 comments
Closed

Ratio of parental alleles to intermissions #195

ktmeaton opened this issue Nov 4, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ktmeaton
Copy link
Owner

ktmeaton commented Nov 4, 2022

I wonder if the ratio of intermissions to diagnostic alleles could be useful to rule out false positives. The filter could be that there must be fewer intermissions than alleles from the "minor" parent.

In this example, there are 3 alleles that could be oming from a "minor" parent BA.2.3.20 (12310G, 16616C, 17678T). And most strains have 3 intermissions (6979T, 27012C, 27513C).

image

@ktmeaton
Copy link
Owner Author

ktmeaton commented Nov 4, 2022

On my first run-through of validation, no positive or negative controls in controls or controls-gisaid fail this filter.

@ktmeaton ktmeaton self-assigned this Nov 4, 2022
@ktmeaton ktmeaton added the enhancement New feature or request label Nov 4, 2022
@ktmeaton ktmeaton moved this to In Progress in ncov-recombinant Nov 4, 2022
@ktmeaton ktmeaton added this to the v0.6.0 milestone Nov 4, 2022
@ktmeaton ktmeaton moved this from In Progress to Done in ncov-recombinant Nov 4, 2022
@ktmeaton ktmeaton closed this as completed Nov 4, 2022
@ktmeaton
Copy link
Owner Author

ktmeaton commented Nov 4, 2022

In some more expanded testing, this is helping to remove some delta/delta false positives.

@ktmeaton
Copy link
Owner Author

ktmeaton commented Nov 8, 2022

I came across a large number of sequences that came back as highly confident BA.5.2/BA.5.3 recombinants. Except, there is substantial allele conflict (intermissions) in the 3' end of the genome (16935 onwards). I realized that I didn't implement logic to use alleles outside the identified regions.

I think these should be considered intermissions, in the sense that they conflict with the evidence for recombination. Not quite a direct conflict as a mismatched allele in a parental region. But still, they are "noisy".

image

@ktmeaton ktmeaton reopened this Nov 8, 2022
@ktmeaton
Copy link
Owner Author

ktmeaton commented Nov 8, 2022

So far, all designated recombinants pass this new logic EXCEPT XAV (Issue #104). Previously, there was the ref allele 21789C that lengthed out the BA.2 section. Now, that is no longer BA.2 diagnostic (maybe BA.2.75 has thrown that off?).

image

However, if we set the populations to BA.2 and BA.5.2, the BA.2 signal is strengthened, but so is the noise slightly.

image

I'm weighing too options:

  1. Tweak modes to have BA.5.2 be a candidate parent.
  2. Set XAV as an auto-pass based on the 3' noise and numerous reversions.

@ktmeaton
Copy link
Owner Author

There is an edge case where this will cause false negatives, when there are additional spurious parents reported sc2rf. For example XBL. My proposed solution is to disable the intermission_allele_ratio filter when there were more parents originally than the number of filtered parents.

image

This was referenced Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant