Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - module for generating coverage data #68

Open
lennijusten opened this issue Oct 9, 2024 · 2 comments
Open

Feature request - module for generating coverage data #68

lennijusten opened this issue Oct 9, 2024 · 2 comments
Labels
enhancement New feature or request priority_3

Comments

@lennijusten
Copy link
Contributor

It would be great to have an (optional) module in the pipeline that generates coverage data for species of interest (e.g., all human infecting pathogens with greater than X number of hits).

Ideally, the resulting coverage data would let you plot individual read alignments and look at coverage depth as a function of position along a reference genome.

The goal I have in mind for a module like this is to build confidence in species assignments by checking that read coverage is approximately uniform across the genome.

I'm unsure of the best implementation, but I've sketched out a rough idea below. Others (CC @mikemc, @jeffkaufman) definitely have more context and probably have better ideas!

Input

  • Reference genome(s) to align to

Output

  • A dataframe with columns read ID, reference genome ID, start position of read along genome, end position of read along genome.
@willbradshaw
Copy link
Contributor

I don't currently see a great way of implementing this within the constraints of the pipeline's current functionality, which makes me think it's a better fit for post-pipeline downstream analysis. But I'm open to changing that view if someone can suggest a concrete implementation that fits well within the current pipeline.

@willbradshaw willbradshaw added enhancement New feature or request priority_3 labels Oct 21, 2024
@willbradshaw
Copy link
Contributor

Having sat on this for a while, I'm now more confident that this won't be implemented in the pipeline as currently conceived anytime soon. However, this might be a better fit for whatever downstream pipeline things like duplication analysis and clade counts end up in after we've implemented #122.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority_3
Projects
None yet
Development

No branches or pull requests

2 participants