-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a command-line argument to toggle using NIO on reading for Spark #6010
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6010 +/- ##
===============================================
+ Coverage 86.929% 87.204% +0.274%
+ Complexity 32765 32712 -53
===============================================
Files 2016 2011 -5
Lines 151460 150925 -535
Branches 16628 16132 -496
===============================================
- Hits 131663 131612 -51
+ Misses 13732 13701 -31
+ Partials 6065 5612 -453
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two quick comments.
@@ -97,6 +98,10 @@ | |||
optional = true) | |||
protected long bamPartitionSplitSize = 0; | |||
|
|||
@Argument(doc = "Whether to use NIO or the Hadoop filesystem (default) for reading files.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably have a note that it doesn't work for writing files currently? What happens if you try to write to a bucket with use-nio mode enabled? One hopes it fails gracefully without throwing some nasty authentication error. We should probably at least have a test of that behavior for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would use the GCS Hadoop connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright that makes sense, i would still add a note saying that NIO is currently not supported for writing. I would also add a test asserting that the round trip (NIO in -> GCS Out) is functioning. That shouldn't be to hard I don't think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note saying NIO is not supported for writing, and added a test to assert the round trip is working.
argBuilder.addArgument("input", gcsInputPath) | ||
.addArgument("output", outputPath) | ||
.addBooleanArgument(GATKSparkTool.USE_NIO, true); | ||
runCommandLine(argBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is probably sufficient but really most of our tests should at least assert something specific. To that end I would think you probably want to make this test asserting equality of the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a copy of the one from the non-Spark test. I think the idea is to check that it doesn't blow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, ignore the CodeCov complaints and go ahead and merge I think.
Fixes #6008