Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Jennifer and Florian,
Thank you for developing Bracken, I've found it very useful in my work and I think it's a great addition to the Kraken classifier. I've discovered what may be a small bug and I thought I'd contribute back to your project by making a minor pull request.
I noticed that Bracken represents redistributed reads as floating point numbers and during output are cast to integers without rounding. This can cause off-by-one errors for the reported counts of a taxa. The affect of this is amplified in the Kraken-format report whereby these off-by-one inaccuracies are summed through the taxonomic tree and can result in larger discrepancies between the number of reads assigned to a clade (inclusive of all children nodes) and the sum of read counts assigned to individual nodes of that clade. Below is an excerpt from a report generated with Bracken which is affected by the described issue:
Here the read count assigned to the Prevotella clade (3,183,985) does not equal the sum of reads assigned at each of the children nodes (3,183,979).
This pull request patches
src/est_abundance.py
to fix this behaviour by rounding the redistributed read counts prior to writing out the default Bracken report and the Kraken-style output format.