Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synced reader does not support regions with contigs including colons #1620

Closed
brenochoa opened this issue May 19, 2023 · 4 comments · Fixed by #1630
Closed

Synced reader does not support regions with contigs including colons #1620

brenochoa opened this issue May 19, 2023 · 4 comments · Fixed by #1630
Assignees

Comments

@brenochoa
Copy link

The synced_bcf_reader currently (as of version 1.17) fails with an error when it is initialised via bcf_sr_regions_init using region strings containing contig names with colons, despite the fact that colons are no longer disallowed characters in contig names since VCFv4.3:

The VCF specification previously disallowed colons (‘:’) in contig names to avoid confusion when parsing
breakends, but this was unnecessary. Even with contig names containing colons, the breakend mate position
notation can be unambiguously parsed because the “:pos” part is always present.

The underlying _regions_init_string function will likely need to be updated to scan for colons from the back instead of the front to accommodate this change of spec.

@pd3
Copy link
Member

pd3 commented May 22, 2023

This is not easy to support. The function accepts regions in multiple formats (chr, chr:pos, chr:beg-end, chr:beg-) and with colons in contig names it is impossible to tell if x:1234 refers to the contig 'x' at position 1234 or to a contig named x:1234.

The only solution I can think of is to move the responsibility for resolving ambiguities like this to the user and require full intervals chr:beg-end in cases where chr is of the form STR:NUM, STR:NUM-, or STR:NUM-NUM.

@brenochoa
Copy link
Author

Guess they should have thought better before allowing colons in contig names... Looks kinda necessary after all! 😅

@jkbonfield
Copy link
Contributor

We solved this already though elsewhere, with additional notation such as {chra:b}:10-20, or even just parsing from the other end so the last colon is the one that is used. This doesn't work if you rather foolishly created a contig named "chr10:100-200" as that's ambiguous, but that's why we adding the curly brace notation.

The htslib APIs correct support this, so it's possibly simply an issue of synced reader doing its own parsing rather than using the official region parsing API. (I haven't looked.)

@jmarshall
Copy link
Member

jmarshall commented May 22, 2023

bcf_sr_regions_init is an unfortunate API that does not have the set of valid contig names available, so the algorithm in Appendix A of the SAM spec is not directly applicable.

However a simplified version of it could be used, and as @jkbonfield noted, the explicit delimiter notation suggested in the appendix could also be supported.

Or perhaps it should be superseded by an API function that is provided with the set of contigs in play in the file(s) to be read.

pd3 added a commit to pd3/htslib that referenced this issue Jun 5, 2023
Note hts_parse_region() cannot be used because it requires the header
and without the header the caller does not learn the contig name.

Resolves samtools#1620
pd3 added a commit to samtools/bcftools that referenced this issue Jun 5, 2023
jkbonfield pushed a commit that referenced this issue Jun 27, 2023
Note hts_parse_region() cannot be used because it requires the header
and without the header the caller does not learn the contig name.

Resolves #1620
vasudeva8 pushed a commit to vasudeva8/htslib that referenced this issue Aug 17, 2023
Note hts_parse_region() cannot be used because it requires the header
and without the header the caller does not learn the contig name.

Resolves samtools#1620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants