-
Notifications
You must be signed in to change notification settings - Fork 444
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Amalgamate multiple CIGAR ops into single entry. (#1607)
Amalgamate multiple CIGAR ops into single entry. Multiple matching (or sequence (mis)matching)) ops (e.g. 10M40M) give a different VCF using BAQ than a single operation of the same length (e.g. 50M). This change compresses the multiple operations into one.
- Loading branch information
Showing
7 changed files
with
36 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
/* realn.c -- BAQ calculation and realignment. | ||
Copyright (C) 2009-2011, 2014-2016, 2018, 2021 Genome Research Ltd. | ||
Copyright (C) 2009-2011, 2014-2016, 2018, 2021, 2023 Genome Research Ltd. | ||
Portions copyright (C) 2009-2011 Broad Institute. | ||
Author: Heng Li <[email protected]> | ||
|
@@ -268,8 +268,28 @@ int sam_prob_realn(bam1_t *b, const char *ref, hts_pos_t ref_len, int flag) { | |
// tseq,tref are no longer needed, so we can steal them to avoid mallocs | ||
uint8_t *left = tseq; | ||
uint8_t *rght = tref; | ||
int len = 0; | ||
|
||
for (k = 0, x = c->pos, y = 0; k < c->n_cigar; ++k) { | ||
int op = cigar[k]&0xf, l = cigar[k]>>4; | ||
|
||
// concatenate alignment matches (including sequence (mis)matches) | ||
// otherwise 50M50M gives a different result to 100M | ||
if (op == BAM_CMATCH || op == BAM_CEQUAL || op == BAM_CDIFF) { | ||
if ((k + 1) < c->n_cigar) { | ||
int next_op = bam_cigar_op(cigar[k + 1]); | ||
|
||
if (next_op == BAM_CMATCH || next_op == BAM_CEQUAL || next_op == BAM_CDIFF) { | ||
len += l; | ||
continue; | ||
} | ||
} | ||
|
||
// last of M/X/= ops | ||
l += len; | ||
len = 0; | ||
} | ||
|
||
if (l == 0) continue; | ||
if (op == BAM_CMATCH || op == BAM_CEQUAL || op == BAM_CDIFF) { | ||
// Sanity check running off the end of the sequence | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>MX | ||
CGTCTACTACG |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
MX 11 4 11 12 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
@HD VN:1.6 SO:coordinate | ||
@SQ SN:MX LN:11 | ||
M 64 MX 1 60 11M * 0 0 CGTCTCCTACG IIIIIIIIIII | ||
X 64 MX 1 60 5=1X5= * 0 0 CGTCTCCTACG IIIIIIIIIII |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
@HD VN:1.6 SO:coordinate | ||
@SQ SN:MX LN:11 | ||
M 64 MX 1 60 11M * 0 0 CGTCTCCTACG IIIIIIIIIII BQ:Z:D@@@@@@@@@D | ||
X 64 MX 1 60 5=1X5= * 0 0 CGTCTCCTACG IIIIIIIIIII BQ:Z:D@@@@@@@@@D |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters