Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug: fixed --keep-zeros skip/dup loci #73

Merged
merged 1 commit into from
Jul 29, 2024
Merged

debug: fixed --keep-zeros skip/dup loci #73

merged 1 commit into from
Jul 29, 2024

Conversation

biermanr
Copy link
Contributor

When using perbase base-depth with --keep-zeros I sometimes get missing and duplicate loci in the output, which I think is the same as issue #71.

I believe this small change to the process_regions function in base_depth.rs fixes the issue which occurs when there are no-depth loci skipped in the result:Vector<PileupPosition>.

The previous code created new_result from result assuming the following situation:

                                                      |End
    -------------xxxxxxxxxxxxxxxxx---------------------
    |
Start

where the x's are where the result vector recorded non-zero depth positions.

The prior code first fills in the "left" of x's, then extends new_result with the x's from result, then
fills in the "right" of the x's.

But the code produces incorrect output when the situation instead looks like:

                                                      |End
    -------------xxxxx-xxx-xx-xxxx---------------------
    |
Start

I've added two new tests cases to check_depths by creating two new rstest #[fixture] called non_mate_aware_keep_zeros_positions and mate_aware_keep_zeros_positions. I'm not sure if this was the best approach and I'm definitely open to suggestions.

Besides these two test cases I've also tested on a small .bam file wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam with the following command which was producing dups/skips before the code changes but now lists each locus once.

cargo build --release

echo "11    4702317 4704317" > test.bed #tab sep

./target/release/perbase base-depth \
    --threads 4 \ 
    --keep-zeros \
    --bed-file test.bed \
    --output counts.tsv \
    HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam

@biermanr
Copy link
Contributor Author

I forgot to write that I really enjoy using this tool, thank you for making it!

@sstadick
Copy link
Owner

This looks fantastic! I'll try to take a look at it tomorrow and get it merged and released.

Thanks for taking the time to put this together!

@sstadick sstadick merged commit 0607422 into sstadick:master Jul 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants