Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several improvements.
implicit
, so gave incorrect results on explicit vs implicit on mixed data types within a single alignment.bam_mods_queryi
API to go along side the existingbam_mods_query_type
one. It returns data on the i^th modification type instead of looking up by 'code'.bam_parse_basemod2
API with an additional flag field. The only flag at present isHTS_MOD_REPORT_UNCHECKED
bam_next_basemod
andbam_mods_at_next_pos
(and implicitlybam_mods_at_qpos
) to honour the new flag.The purpose of the flag is to allow us to distinguish the three states of modified, unmodified, and unchecked.
Imagine a scenario of a processing a pileup where we're trying to gather statistics on what percentage of data in the pileup is modified. If all sequences are using implicit mode where every base is checked and we record the likely modifications in MM and (implicitly) the unrecorded bases have been deemed to be unmodified, then the statistics are a simple binary yes/no.
If we only have explicit-mode data present, where the mod-caller checked some places only and the QUAL field is used to distinguish between likely modified and unlikely modified, and uncalled positions have not been checked, then we need to count differently. We gather statistics only on the called bases and make a binary split on QUAL instead. Uncalled bases (those not in the MM tag) have no information on modification status so are ignored.
However if we have a mix of the two, we get in a pickle. We have to start using the
bam_mods_queryi
/bam_mods_query_type
API for each call list to determine whether the absence of a mod at a particular site means no-mod-detected or did-not-look.With the
HTS_MOD_REPORT_UNCHECKED
flag set, the explicit-mode will produce a modification call even on the sites that aren't listed, but with a specific qual ofHTS_MOD_UNCHECKED
. This means code can handle implicit, explicit and mixed data with a single loop without needing to do additional queries. No mod or low qual = not found. Mod with high qual=found. Mod witth unchecked qual => ignore for stats.Fixes #1550 hopefully!