-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IO] What is the desired behavior when writing an empty BAM file? #2497
Comments
Hi @tloka, this is currently not possible. Thank you for this use case. When re-discussing the design, I had the feeling that it would make sense to explicitly allow writing the alignment header. Something along the lines: seqan3::sam_file_output fout {...};
seqan3::sam_file_header header{...};
seqan3::sam_file_record record1{...}
seqan3::sam_file_record record1{...};
fout.push_back(header); // write out header
fout.push_back(record1); // write out record1
fout.push_back(record2); // write out record2
// this will throw, because header can only be written once
fout.push_back(header); It seems to be a bam specification violation on our side, because BAM requires a header. The bottom-line is that in any case we should write a valid empty header at deconstruction if no records were provided. |
I agree. For the rest, I am actually not sure it would be necessary/intuitive to construct the header separately and specify the push_back() function for this. For me using I would rather think of something like a function Somehing like this: // Example 1: Header is implicitely written with the first record.
{
seqan3::sam_file_output fout{"/home/tobias/livetools_test/test.bam", names, lengths};
seqan3::sam_file_record record1{...};
fout.push_back(record1);
}
// Example 2: Header is explicitely flushed. Same result as Example 1.
{
seqan3::sam_file_output fout{"/home/tobias/livetools_test/test.bam", names, lengths};
fout.flush_header();
seqan3::sam_file_record record1{...};
fout.push_back(record1);
}
// Example 3: Empty output file. Valid empty header needs to be written on destruction.
{
seqan3::sam_file_output fout{"/home/tobias/livetools_test/test.bam", names, lengths};
}
// Example4: Empty output file. However, the full header is included as it was explicitely flushed.
{
seqan3::sam_file_output fout{"/home/tobias/livetools_test/test.bam", names, lengths};
fout.flush_header();
}
// Example5: Invalid call of flush_header() does nothing. Same result as in Example 1 and 2
{
seqan3::sam_file_output fout{"/home/tobias/livetools_test/test.bam", names, lengths};
seqan3::sam_file_record record1{...};
fout.push_back(record1);
fout.flush_header(); // no effect; Header is already flushed with the first record.
} |
(Agree! We still need some way to modify a header that wasn't constructed before.) |
Hello, can I ask about an update to this? I'd be happy to work on this myself if nobody else is currently doing so. I'd also like "valid" BAM files written when no records are passed. |
Great to see this resolved, really appreciate it! |
Platform
Question
Just a quick question:
I am using SeqAn3's BAM output. I need to create a
seqan3::sam_file_output
object even before I know whether there will be valid alignments to write. Thus, in some cases, it can happen that there is no entry being written.However, the resulting file is actually invalid (at least for samtools), as the header is only written with the first entry. Thus, while the file exists and has a valid EOF marker, samtools returns an error when trying to view such an empty BAM file:
I am using two different samtools version, the error is actually only showing in the newer version (1.10 vs. 1.07). I didn't test the most recent version, but I guess the error is produced by the new HTSLib header which was introduced in samtools 1.10 and performs additional validity checks for the header (see here; Samtools now uses the new HTSlib header API. As this adds more checks for invalid headers, it is possible that some illegal files will now be rejected when they would have been allowed by earlier versions.)
I am wondering whether this is the expected behavior, or whether the header should also be written for empty files to get a valid file as an output!?
Minimal example:
Writing SAM accordingly also produces an empty file without header.
I am wondering if this is expected behavior, as the resulting file seems to not be valid according to samtools. However, I think it should be possible to create a valid empty file, which would also be a nice indicator that the resulting empty file is not due to some error, but does really not contain any alignments.
The text was updated successfully, but these errors were encountered: