-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallelism for the distributedLoad command #9895
Implement parallelism for the distributedLoad command #9895
Conversation
Signed-off-by: liuhongtong <[email protected]>
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
@liuhongtong Thanks for this improvement! Could you provide some more information in the PR description? Thanks! |
Merged build finished. Test FAILed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuhongtong Thanks for the substantial improvement to this call. Do you mind taking a look at PersistCommand
and seeing if that pattern can apply to distributedload?
Unrelated to this PR - We have a few implementations of multi-threaded CLI and it would be good to consolidate them into a single framework.
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
.hasArg(true) | ||
.desc("number of replicas to have for each block of the loaded file") | ||
.build(); | ||
public static final Option THREAD_OPTION = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: THREADS_OPTION
to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean const value like C language?
@@ -35,7 +43,28 @@ | |||
*/ | |||
@ThreadSafe | |||
public final class DistributedLoadCommand extends AbstractFileSystemCommand { | |||
private static final String REPLICATION = "replication"; | |||
public static final Option REPLICATION = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: REPLCIATION_OPTION
to be consistent
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
shell/src/main/java/alluxio/cli/fs/command/DistributedLoadCommand.java
Outdated
Show resolved
Hide resolved
@gpang distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. |
Signed-off-by: liuhongtong <[email protected]>
Loading 5TiB data from a big HDFS cluster with 30 Alluxio worker & job worker. load test
distributedLoad test
new distributedLoad test
|
Merged build finished. Test PASSed. |
Test PASSed. |
@apc999 @calvinjia updated. PTAL. Thanks. |
Automated checks report:
All checks passed! |
@liuhongtong Could you open a github issue for consolidating optionally multi-threaded CLIs to a general framework? |
@calvinjia OK. I would like to open a new issue and consolidate a general framework for multi-threaded CLIs. |
Signed-off-by: liuhongtong <[email protected]>
New issue: #9905 |
Merged build finished. Test PASSed. |
Test PASSed. |
Signed-off-by: liuhongtong <[email protected]>
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test PASSed. |
Test PASSed. |
alluxio-bot, merge this please |
@apc999 Thanks for merging this pr. |
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable.
Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process.
Signed-off-by: liuhongtong [email protected]
Fix #9791