Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does, or can, piscem allow for mismatches between query and index? #17

Open
jeremymsimon opened this issue Jan 5, 2024 · 4 comments
Open

Comments

@jeremymsimon
Copy link

Hey @rob-p - Pretty much as the title suggests.

I'm doing some testing for a non-canonical application of piscem and am noticing that I only seemingly get tpm and ecount > 0 from piscem -> piscem-infer when there is an exact match between query and index sequence.

Is there a parallel to pufferfish's --minScoreFraction here? Or is there some other parameter tuning that I'm missing such that mismatches are allowable?

Thanks!

@rob-p
Copy link
Contributor

rob-p commented Jan 6, 2024

Hi @jeremymsimon,

Piscem does pseudoalignment (optionally with structural constraints). This means that there must be at least 1 matching kmer in order to report a match. However, there is no restriction on the query as a whole —- so there can be mismatches, gaps, etc. However, that only applies if your query is of length > k.

—Rob

@jeremymsimon
Copy link
Author

So - excuse the naive question - does that mean in practice that if my query is 25bp and my index is built with k=23, I could in effect have 0, 1, or 2 mismatches?

And relatedly, how does piscem handle a case if a given query aligns perfectly to one location but with 1 mismatch to a secondary location?

@rob-p
Copy link
Contributor

rob-p commented Jan 9, 2024

Hi @jeremymsimon,

No need for apologies!

does that mean in practice that if my query is 25bp and my index is built with k=23, I could in effect have 0, 1, or 2 mismatches?

It depends somewhat on the details of what happens with the non-matching k-mers and where the mismatch occurs (see my answer to your other question below). But in such a case, any of the present 23-mers would be sufficient to map the read to the reference.

And relatedly, how does piscem handle a case if a given query aligns perfectly to one location but with 1 mismatch to a secondary location?

Currently, only co-optimal mappings are reported. That is, if there is some set of targets P that account for the maximum number (say m) of matched k-mers, then any target Q having fewer than m matches will not be reported. Note in this case, since we're just talking about k-mer matches, even a single mismatch could cause many k-mers not to match (e.g. a mismatch in the middle of a target could cause there to be up to k mismatches). For example, if you have a query of length 50, and a mismatch right in the "middle" (not exactly the middle since 50 is even, but you get the idea), then it's possible that no 25-mer could actually match exactly.

It's worth mentioning that the behavior of what to do with sub-optimal mappings is something that could be adjusted / modified. That is, if there's a need to report such things it would likely be possible to add such functionality to piscem. However, as a k-mer based method, the constraint that at least a single k-mer must match the query is pretty fundamental.

@jeremymsimon
Copy link
Author

Thanks for that description, and I think this is the issue I'm facing with an already-short k in my case (k < 20) given my short queries. If I want to be tolerant of a potential mismatch in the middle of the query, then in actuality, it seems I would need an exceedingly small k=9 or shorter, and for up to 3 mismatches that could occur anywhere, it may simply not be feasible. Am I understanding that all correctly? If so this may just represent a fundamental limitation of a k-mer approach for the type of non-canonical application I'm looking at currently-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants