-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Path options for SAMFileWriterFactory #840
add Path options for SAMFileWriterFactory #840
Conversation
Codecov Report
@@ Coverage Diff @@
## master #840 +/- ##
===============================================
- Coverage 66.003% 65.966% -0.037%
- Complexity 7460 7470 +10
===============================================
Files 529 529
Lines 31962 31992 +30
Branches 5463 5464 +1
===============================================
+ Hits 21096 21104 +8
- Misses 8724 8742 +18
- Partials 2142 2146 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very useful for me! In addition, I would appreciate if:
- Code paths related with filenames use
toURI().toString()
instead oftoString()
. In my case, I find it more useful for debugging that the returned string representtion. - When I was working in other parts of the code to support
Path
, I found quite useful the Jmfs to test non-file paths. It will be cool to have that for SAM/BAM/CRAM writing too!
} | ||
|
||
// write the BAM records to a temporary CRAM | ||
final File tempCRAMFile = File.createTempFile("testBAMThroughCRAMRoundTrip", CramIO.CRAM_FILE_EXTENSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for the Path tests will be better to use the Jimfs (in-memory filesystem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent idea! I see we already depend on jimfs.
I'm also adding it to the test of IOUtil.addExtension
, so we can test non-default-filesystem paths.
I also changed testCRAMThroughBAMRoundTrip so it saves to jimfs instead of the filesystem, that should speed things up a little bit.
This leaves only one test using files. I'm thinking of leaving it so we exercise that path through the code, though we can remove it if you feel strongly about it (these methods just call .toPath()
and then redirect to the path code anyways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was related to use at least in the changed test the jims filesystem, but anyway I think that some of the tests should still have File
, which is still quite common.
final File referenceFasta) { | ||
OutputStream cramOS = null; | ||
OutputStream indexOS = null ; | ||
|
||
if (createIndex) { | ||
if (!IOUtil.isRegularPath(outputFile)) { | ||
log.warn("Cannot create index for CRAM because output file is not a regular file: " + outputFile.getAbsolutePath()); | ||
log.warn("Cannot create index for CRAM because output file is not a regular file: " + outputFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not be better to use outputFile.toURI()
for easier debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done!
} | ||
catch (final IOException ioe) { | ||
throw new RuntimeIOException("Error creating index file for: " + outputFile.getAbsolutePath()+ BAMIndex.BAMIndexSuffix); | ||
throw new RuntimeIOException("Error creating index file for: " + indexPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the URI will be useful also here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
} | ||
catch (final IOException ioe) { | ||
throw new RuntimeIOException("Error creating CRAM file: " + outputFile.getAbsolutePath()); | ||
throw new RuntimeIOException("Error creating CRAM file: " + outputFile.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the URI will be useful also here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
@@ -303,7 +342,7 @@ public SAMFileWriter makeSAMWriter(final SAMFileHeader header, final boolean pre | |||
if (this.useAsyncIo) return new AsyncSAMFileWriter(ret, this.asyncOutputBufferSize); | |||
else return ret; | |||
} catch (final IOException ioe) { | |||
throw new RuntimeIOException("Error opening file: " + outputFile.getAbsolutePath()); | |||
throw new RuntimeIOException("Error opening file: " + outputPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got it!
final boolean createIndex = this.createIndex && IOUtil.isRegularPath(outputFile); | ||
OutputStream os = IOUtil.maybeBufferOutputStream(Files.newOutputStream(outputPath), bufferSize); | ||
if (createMd5File) os = new Md5CalculatingOutputStream(os, IOUtil.addExtension(outputPath,".md5")); | ||
final BAMFileWriter ret = new BAMFileWriter(os, outputPath.toString(), compressionLevel, deflaterFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI string better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done!
Thank you, excellent suggestions! I'm glad this code will be useful to you. Let me know what you think of this revised version! |
I see that storing the file name as a URL broke the tests; but that's a good thing because it looks like this uncovered usage that wouldn't have worked when using other filesystems. I've added a test for |
@@ -26,7 +27,7 @@ public StreamInflatingIndexingOutputStream(final OutputStream s1, final File ind | |||
this.s1 = s1; | |||
this.s2 = new PipedOutputStream(); | |||
final PipedInputStream pin = new PipedInputStream(this.s2, Defaults.NON_ZERO_BUFFER_SIZE); | |||
this.thread = new Thread(new Indexer(indexFile, pin), "BamIndexingThread"); | |||
this.thread = new Thread(new Indexer(indexFile.toPath(), pin), "BamIndexingThread"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be possible to add a constructor for Path
also here, and delegate the File
one to that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
Assert.assertEquals(IOUtil.addExtension(p, ".ext"), IOUtil.getPath("/folder/file.ext")); | ||
p = IOUtil.getPath("folder/file"); | ||
Assert.assertEquals(IOUtil.addExtension(p, ".ext"), IOUtil.getPath("folder/file.ext")); | ||
FileSystem jimfs = Jimfs.newFileSystem(Configuration.unix()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jmfs should be closed. I guess that try-with-resources is the better option for that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The I added two more comments that I found while reviewing the latest changes, one related with closing jimfs and the other regarding another ctor for |
You're welcome @magicDGS, thank you for your helpful comments! |
If I understand correctly we're waiting for a thumbs-up from someone at Broad. @droazen or @lbergelson perhaps? |
I'll have a look later this week. |
@droazen gentle ping. This is needed for NIO writes. |
@droazen weekly ping. We need this to be able to make progress. |
@droazen ? @lbergelson ? @tfenne ? Is there anyone here? Hello? I can't add good cloud support support to GATK if every commit takes 25 days. |
@jean-philippe-martin Sorry for the delay on this review -- we've been dealing with some unexpected and urgent internal deadlines lately, as I discussed at our last group meeting, and have not had much time for htsjdk reviews. Have to ask you to be patient a bit longer, but we'll get to this. |
this needs a review.... |
@droazen - customary reminder that this code is waiting for review. I see conflicts are starting to creep in. |
@lbergelson if you start this week we may get a few rounds of review before I leave for my vacation. |
@droazen @lbergelson Customary reminder that this code is still waiting for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jean-philippe-martin Finally, after much delay.... a review!
Looks pretty good overall. A few nitpicks and requests for more tests.
I do have one serious worry about introducing new NPEs with the new method overloads. I think those should be made to pass through nulls, or we should add tests that show that they can't accept nulls in the first place. We definitely introduced bugs during earlier path expansions because we didn't think about nulls as a viable input for a file.
@@ -75,7 +75,13 @@ protected BAMFileWriter(final OutputStream os, final File file, final int compre | |||
outputBinaryCodec.setOutputFileName(getPathString(file)); | |||
} | |||
|
|||
private void prepareToWriteAlignments() { | |||
protected BAMFileWriter(final OutputStream os, final String aboluteFilename, final int compressionLevel, final DeflaterFactory deflaterFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, aboluteFilename -> absoluteFilename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Fixed.
if (indexFile.exists()) { | ||
if (!indexFile.canWrite()) { | ||
throw new SAMException("Not creating BAM index since unable to write index file " + indexFile); | ||
final String indexFileBase = pathURI.toString().endsWith(BamFileIoUtils.BAM_FILE_EXTENSION) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a relic of a previous refactoring, pathUri
is already a String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed toString()
@@ -59,6 +60,24 @@ public BinaryBAMIndexWriter(final int nRef, final File output) { | |||
} | |||
|
|||
/** | |||
* constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to write in constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the word "constructor" (I think that's what you mean?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is what I mean
* @param nRef Number of reference sequences | ||
* @param output BAM Index output file | ||
*/ | ||
public BinaryBAMIndexWriter(final int nRef, final Path output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the existing file constructor delegate to this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done!
* @param fileHeader header for the corresponding bam file | ||
*/ | ||
public BAMIndexer(final File output, final SAMFileHeader fileHeader) { | ||
this(output.toPath(), fileHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me uncomfortable that these can all throw NullPointerException
now. We've been bitten by newly introduced NPR's in other places we've done this conversion.
In most cases it will probably NPR later if a null is passed in, but HTSJDK isn't very consistent about how nulls are handled and null is sometimes a valid value in surprising places.
I think we'd be safer if we wrapped all of these in a ternary expression. Maybe we should add IOUtil.toPath(File) that returns a path or null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Changing it here.
final SAMFileHeader samHeader) { | ||
|
||
// NOTE: even when the input is coord-sorted, using assumePresorted=false will cause some | ||
// tests to fail since it can change the order of some unmapped reads - AFAICT this is allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right about the ordering issue I believe.
// index only created if coordinate sorted | ||
header.setSortOrder(SAMFileHeader.SortOrder.coordinate); | ||
header.addSequence(new SAMSequenceRecord("chr1", 123)); | ||
final SAMFileWriter writer = factory.makeBAMWriter(header, false, outputPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it should be a try() with resources
@@ -150,6 +168,18 @@ private void createSmallBam(final File outputFile) { | |||
writer.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be a try with resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here and below.
if (verifySupplementalFiles) { | ||
final Path index = SamFiles.findIndex(output); | ||
final Path md5File = IOUtil.addExtension(output, ".md5"); | ||
Assert.assertTrue(Files.size(index) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract this out as a helper method since it's duplicated. Although there's a ton of duplicated code in these test files anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is there twice, but one of them uses a File and other other uses a Path. Not much there to deduplicate.
@@ -170,6 +175,23 @@ public void testFileTypeUnix(final String path, boolean expectedIsRegularFile) { | |||
Assert.assertEquals(IOUtil.isRegularPath(file), expectedIsRegularFile); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a unit test for isRegularPath(path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Addressed change requests. For some reason I can't compile this in intellij anymore; ./gradlew test running as we speak. |
|
@lbergelson it's been a week. I'm going to start rebasing to the new master and resolving conflicts. |
Also: - use jimfs for most CRAM compliance tests - handle codec filenames starting with "file://" - add jimfs test for bam/cram writer - protect against null path
Done. Pushing the new code.
|
3118627
to
a8a680f
Compare
@jean-philippe-martin It looks like there was an issue with the rebase:
Good to merge once that's resolved and test pass! |
Oh indeed, I forgot to add that fix to the commit. Good on Travis! Fixed now. I don't have a "merge" button here (on GitHub) so @lbergelson I'll let you do the merging once Travis comes up green (unless you want me to do that on the command line? I seem to remember some prohibition against that). |
@jean-philippe-martin Your long nightmare is finally over! |
Champagne! Thank you Louis! We can celebrate the submission of my longest-ever pull request. From April 3 to September 1st, so two days short of 5 months long! 🎉 |
Description
Add overloads to SAMFileWriterFactory that take
Path
as input instead ofFile
. The old version that tookFile
now just redirect to theirPath
equivalent, so there's the same amount of code to test.The motivation for this change is to support Java NIO's pluggable file systems. This change (in combination with Google's NIO provider) allows one to directly write a SAM, BAM, or CRAM file to a Google Cloud Storage bucket. See eg. GATK's matching issue.
Checklist