Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion / possible features #19

Open
imartayan opened this issue Mar 1, 2024 · 2 comments
Open

Discussion / possible features #19

imartayan opened this issue Mar 1, 2024 · 2 comments

Comments

@imartayan
Copy link

Hi there!
I'd like to discuss the state of this project to see if it would be suitable for larger Rust libraries.

First I'd like to say that I really like the design of this library, especially the separation of encoding and Kmer, and the fact that it scales with large K using bitfields.

Still, I think that it would be nice to support more elementary operations to make the library more convenient, and that further optimizations could be explored.
In no particular order, I'm thinking of the following features

  • navigational methods computing the predecessor/successor of a k-mer by prepending/appending a given base
  • an iterator producing k-mers from an iterator of u8
  • a method to compute the canonical version of a k-mer / to test if a k-mer is canonical
  • optimizing some operations with SIMD if we use multiples integers to store a k-mer

Do you think these features are appropriate for this library, or would you prefer to keep it simple?
I've already implemented some of these features in a less generic way (see this module), so I could try to adapt the code to this library.

@theJasonFan
Copy link
Member

@rob-p can chime in. I wonder if the solution right now is to provide some examples of how the kmer library can be used. We've written kmer iterators and canonical iterators for the naive implementation in the past.

i.e.: https://github.com/COMBINE-lab/kmers/blob/main/src/naive_impl/canonical_kmer_iterator.rs

@rob-p I hope you don't mind me speaking on your behalf, but I'm 99% sure PRs are always welcome!

@rob-p
Copy link
Contributor

rob-p commented Mar 2, 2024

Hi @imartayan,

Thanks for opening this issue / discussion! I would very much like to have some community momentum pick up behind this repo & crate. Basically, I created it quite a while ago when I thought it would be very useful for the Rust bioinformatics community to have a flexible and relatively standard "go-to" crate for basic k-mer representation and manipulation (it was also right around the time that MVP const generics hit, so it was an opportunity to see how they could be used for k-mer represenation).

There was some initial help from folks who pitched in, but then momentum sort of fell away. However, this repo largely remained the place where we added k-mer functionality we needed internally for different projects. However, as @theJasonFan notes, we really only focused on adding things where we needed them, and so a lot of functionality got added only to the naive_impl rather than, as would have been ideal, added generically for all representations.

Anyway, long story short, I'd be very happy to have development here pick up again! I think there are a lot of obvious things that could go into this crate, and I'd be excited to help build it up. Any PRs you'd want to make are certainly welcome!

Best,
Rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants