-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1530] support MPU for S3 #2830
base: main
Are you sure you want to change the base?
Conversation
Thanks for this PR. Are there any test results? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR but there are some points to polish.
</dependency> | ||
<dependency> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-1.2-api</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency is duplicated.
<name>aws-mpu-deps</name> | ||
</property> | ||
</activation> | ||
<dependencies> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these dependencies can be moved to dependencies section because this module is loaded when aws-mpu profile is activated only.
|
||
<profiles> | ||
<profile> | ||
<id>aws-mpu</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profile name can be changed to aws.
|
||
package org.apache.celeborn.server.common.service.mpu.bean; | ||
|
||
public class AWSCredentials { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should not be in the common module.
<property> | ||
<name>aws-mpu-deps</name> | ||
</property> | ||
</activation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This segment is not needed.
<activation>
<property>
<name>aws-mpu-deps</name>
</property>
</activation>
DynConstructors.builder() | ||
.impl( | ||
"org.apache.celeborn.S3MultipartUploadHandler", | ||
awsCredentials.getClass(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass the arguments to S3MultipartUploadHandler
should be enough for this scenerio.
task = new S3FlushTask(flushBuffer, diskFileInfo.getDfsPath(), notifier, true); | ||
task = | ||
new S3FlushTask( | ||
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber); | |
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber++); |
@@ -273,6 +310,7 @@ public void flush(boolean finalFlush, boolean fromEvict) throws IOException { | |||
if (task != null) { | |||
addTask(task); | |||
flushBuffer = null; | |||
partNumber++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed
logger.warn("Abort s3 multipart upload for {}", diskFileInfo.getFilePath()); | ||
s3MultipartUploadHandler.complete(); | ||
} | ||
|
||
if (notifier.hasException()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two if blocks can be merged.
import java.lang.{Long => JLong} | ||
import java.util.{List => JList} | ||
|
||
case class MultipartUploadRequestParam( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused class. Can be removed.
@zhaohehuhu @FMX @WillemJiang |
I still need more time to fully test it as S3 has some limitations related to MPU. |
What changes were proposed in this pull request?
as title
Why are the changes needed?
AWS S3 doesn't support append, so Celeborn had to copy the historical data from s3 to worker and write to s3 again, which heavily scales out the write. This PR implements a better solution via MPU to avoid copy-and-write.
Does this PR introduce any user-facing change?
How was this patch tested?