Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow for IUPAC codes in GetCanonicalMotif() #239

Closed
wants to merge 6 commits into from

Conversation

aryarm
Copy link
Member

@aryarm aryarm commented Oct 30, 2024

partially addresses #238

This will allow TRTools's GetCanonicalMotif() to work for sequences that have IUPAC codes. However, this will not entirely solve the problem described in #238, since any function that calls GetCanonicalMotif() will now have to handle the IUPAC codes somehow.

@aryarm aryarm marked this pull request as ready for review October 30, 2024 19:41
Copy link
Contributor

@LiterallyUniqueLogin LiterallyUniqueLogin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the reverse complement issue, whether or not this should be done depends on how it is used or intended to be used. I see no uses of GetCanonicalMotif or ReverseComplement in TRTools, so I cannot evaluate that and instead leave that choice to you.

trtools/utils/utils.py Show resolved Hide resolved
trtools/utils/tests/test_utils.py Show resolved Hide resolved
@aryarm
Copy link
Member Author

aryarm commented Nov 2, 2024

Thanks for the suggestions! I tried my best to implement them in c1f78ee. Does that look better?

I see no uses of GetCanonicalMotif or ReverseComplement in TRTools, so I cannot evaluate that and instead leave that choice to you.

In this case, I think GetCanonicalMotif() is being used to discover whether variant records from different callers are mergeable in EnsembleTR:
https://github.com/gymrek-lab/EnsembleTR/blob/43104bfac4f98e5ea7500e2220500c21e6f337c5/ensembletr/vcfio.py#L230
(There might also be other uses, but from what I can gather, that's the main one.)
So once this PR is merged, it would behoove us to handle IUPAC codes in EnsembleTR, as well. We will have to ensure that AGRG is marked equivalent to AGAG, for example.

@LiterallyUniqueLogin
Copy link
Contributor

This looks good!

(Though I haven't personally checked your complement table is fully correct :) )

@aryarm
Copy link
Member Author

aryarm commented Nov 25, 2024

great, thanks for the review, Jonathan!

@aryarm
Copy link
Member Author

aryarm commented Nov 27, 2024

It turns out that this PR actually isn't helpful for fixing the issue in EnsembleTR anyway, so I'm just going to close it for now.

Feel free to reopen if it's ever needed!

In the meantime, you can follow gymrek-lab/EnsembleTR#32 for the next chapter of this story

@aryarm aryarm closed this Nov 27, 2024
@aryarm aryarm deleted the feat/iupac-GetCanonicalMotif branch December 1, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants