-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21475][Core] Use NIO's Files API to replace FileInputStream/FileOutputStream in some critical paths #18684
Conversation
Change-Id: I0f11b9e0cbe62ca3d0bac7bfe0e2df838da80b48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose using NIO, yeah. Are there more places that could be updated that could make a difference?
@@ -188,17 +189,20 @@ public void write(Iterator<Product2<K, V>> records) throws IOException { | |||
return lengths; | |||
} | |||
|
|||
final FileOutputStream out = new FileOutputStream(outputFile, true); | |||
final FileChannel out = FileChannel.open(outputFile.toPath(), | |||
ImmutableSet.of(WRITE, APPEND, CREATE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent 4 spaces?
boolean copyThrewException = true; | ||
try { | ||
lengths[i] = Utils.copyStream(in, out, false, transferToEnabled); | ||
final long size = in.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why final? we generally don't write that unless it's required. I know the surrounding code does though anyway
Thanks @srowen for your review. I generally focused on shuffle related parts to avoid using And for other modules like SQL I don't address the issues there, not sure if it is also needed. |
Change-Id: I883b36420471a20615dbb1e7e10421884fa0b690
Test build #79769 has finished for PR 18684 at commit
|
Test build #79771 has finished for PR 18684 at commit
|
@@ -132,7 +134,8 @@ public Object convertToNetty() throws IOException { | |||
if (conf.lazyFileDescriptor()) { | |||
return new DefaultFileRegion(file, offset, length); | |||
} else { | |||
FileChannel fileChannel = new FileInputStream(file).getChannel(); | |||
FileChannel fileChannel = FileChannel.open(file.toPath(), | |||
ImmutableSet.of(StandardOpenOption.READ)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a new set for this? should we call:
FileChannel fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're actually the same, the one you mentioned is just a simple wrapper of the former one. I will change to yours for the simplicity.
@@ -188,17 +189,20 @@ public void write(Iterator<Product2<K, V>> records) throws IOException { | |||
return lengths; | |||
} | |||
|
|||
final FileOutputStream out = new FileOutputStream(outputFile, true); | |||
final FileChannel out = FileChannel.open(outputFile.toPath(), | |||
ImmutableSet.of(WRITE, APPEND, CREATE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final FileChannel out = FileChannel.open(outputFile.toPath(), WRITE, APPEND, CREATE)
?
lengths[i] = Utils.copyStream(in, out, false, transferToEnabled); | ||
long size = in.size(); | ||
Utils.copyFileStreamNIO(in, out, 0, size); | ||
lengths[i] = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we modify Utils.copyStream()
to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There're lot of other places using Utils#copyStream
, it would be better not to change this API.
Change-Id: Ibb0036d0ac88c01310cba817da0bb40535c12351
Test build #79801 has finished for PR 18684 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@srowen any further comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth updating, yes, if you can see a performance improvement
@cloud-fan @JoshRosen can you please help to review, thanks! |
Merged to master |
…ternal shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't override `skip` method for `InputStream`, so it will bring in performance issue (mentioned in #20119). But using `FileInputStream`/`FileOutputStream` will also bring in memory issue (https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful), which is severe for long running external shuffle service. So here in this proposal, only fixing the external shuffle service related code. ## How was this patch tested? Existing tests. Author: jerryshao <[email protected]> Closes #20144 from jerryshao/SPARK-21475-v2. (cherry picked from commit 93f92c0) Signed-off-by: Shixiong Zhu <[email protected]>
…ternal shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't override `skip` method for `InputStream`, so it will bring in performance issue (mentioned in #20119). But using `FileInputStream`/`FileOutputStream` will also bring in memory issue (https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful), which is severe for long running external shuffle service. So here in this proposal, only fixing the external shuffle service related code. ## How was this patch tested? Existing tests. Author: jerryshao <[email protected]> Closes #20144 from jerryshao/SPARK-21475-v2.
What changes were proposed in this pull request?
Java's
FileInputStream
andFileOutputStream
overrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will only get cleaned in Full GC. This will introduce two side effects:https://bugs.openjdk.java.net/browse/JDK-8080225
https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful
So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.
Left unchanged FileInputStream in core which I think is not so critical:
Left unchanged FileOutputStream in core:
Here in
DiskBlockObjectWriter
, it usesFileDescriptor
so it is not easy to change to NIO Files API.For the
FileInputStream
andFileOutputStream
in common/shuffle* I changed them all.How was this patch tested?
Existing tests and manual verification.