Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command-line argument to toggle using NIO on reading for Spark #6010

Merged
merged 3 commits into from
Jul 15, 2019

Conversation

tomwhite
Copy link
Contributor

Fixes #6008

@codecov
Copy link

codecov bot commented Jun 18, 2019

Codecov Report

Merging #6010 into master will increase coverage by 0.274%.
The diff coverage is 41.176%.

@@               Coverage Diff               @@
##              master     #6010       +/-   ##
===============================================
+ Coverage     86.929%   87.204%   +0.274%     
+ Complexity     32765     32712       -53     
===============================================
  Files           2016      2011        -5     
  Lines         151460    150925      -535     
  Branches       16628     16132      -496     
===============================================
- Hits          131663    131612       -51     
+ Misses         13732     13701       -31     
+ Partials        6065      5612      -453
Impacted Files Coverage Δ Complexity Δ
...park/pipelines/PrintReadsSparkIntegrationTest.java 56.522% <0%> (-36.335%) 2 <0> (ø)
...der/engine/spark/datasources/ReadsSparkSource.java 63.158% <100%> (+1.825%) 18 <1> (+1) ⬆️
...stitute/hellbender/engine/spark/GATKSparkTool.java 82.873% <100%> (+0.095%) 78 <0> (ø) ⬇️
...ellbender/tools/spark/pathseq/PathSeqBwaSpark.java 67.391% <100%> (ø) 7 <0> (ø) ⬇️
...tools/spark/validation/CompareDuplicatesSpark.java 89.63% <100%> (ø) 40 <0> (ø) ⬇️
...lbender/tools/spark/pathseq/PathSeqScoreSpark.java 58.491% <100%> (+1.083%) 7 <0> (ø) ⬇️
...lkers/varianteval/evaluators/VariantEvaluator.java 70% <0%> (-12.353%) 12% <0%> (ø)
...ls/walkers/varianteval/util/EvaluationContext.java 75.676% <0%> (-4.88%) 12% <0%> (ø)
...er/utils/runtime/CapturedStreamOutputSnapshot.java 76.923% <0%> (-1.648%) 4% <0%> (ø)
...t/java/org/broadinstitute/hellbender/MainTest.java 84.746% <0%> (-0.969%) 15% <0%> (ø)
... and 314 more

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick comments.

@@ -97,6 +98,10 @@
optional = true)
protected long bamPartitionSplitSize = 0;

@Argument(doc = "Whether to use NIO or the Hadoop filesystem (default) for reading files.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have a note that it doesn't work for writing files currently? What happens if you try to write to a bucket with use-nio mode enabled? One hopes it fails gracefully without throwing some nasty authentication error. We should probably at least have a test of that behavior for the time being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would use the GCS Hadoop connector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright that makes sense, i would still add a note saying that NIO is currently not supported for writing. I would also add a test asserting that the round trip (NIO in -> GCS Out) is functioning. That shouldn't be to hard I don't think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note saying NIO is not supported for writing, and added a test to assert the round trip is working.

argBuilder.addArgument("input", gcsInputPath)
.addArgument("output", outputPath)
.addBooleanArgument(GATKSparkTool.USE_NIO, true);
runCommandLine(argBuilder);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is probably sufficient but really most of our tests should at least assert something specific. To that end I would think you probably want to make this test asserting equality of the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a copy of the one from the non-Spark test. I think the idea is to check that it doesn't blow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, ignore the CodeCov complaints and go ahead and merge I think.

@tomwhite tomwhite merged commit f060e76 into master Jul 15, 2019
@tomwhite tomwhite deleted the tw_spark_nio branch July 15, 2019 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a command-line argument to toggle using NIO on reading for Spark
2 participants