-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance of distributedLoad #9791
Labels
Comments
apc999
added
type-bug
This issue is about a bug
and removed
type-feature
This issue is a feature request
labels
Aug 26, 2019
apc999
changed the title
How to improve the performance of distributedLoad
Low performance of distributedLoad
Aug 27, 2019
@liuhongtong Are you loading multiple 600MB files for a total of 5TB data? Also, can you verify if all the nodes are being utilized (you can generally tell by all the workers having Alluxio space used). |
@calvinjia 5TB data in total, each file is 600MB. |
liuhongtong
added a commit
to liuhongtong/alluxio
that referenced
this issue
Sep 16, 2019
Signed-off-by: liuhongtong <[email protected]>
alluxio-bot
pushed a commit
that referenced
this issue
Sep 20, 2019
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix #9791 pr-link: #9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
bf8086
pushed a commit
to bf8086/alluxio
that referenced
this issue
Oct 1, 2019
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
apc999
pushed a commit
to apc999/alluxio
that referenced
this issue
Oct 2, 2019
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
apc999
pushed a commit
to apc999/alluxio
that referenced
this issue
Oct 2, 2019
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
apc999
pushed a commit
to apc999/alluxio
that referenced
this issue
Oct 23, 2019
distributedLoad traverses the specified path, distributes a load job of a file to job master and waits the job completed one by one, that is a serial process. So the performance may be not acceptable. Now distributedLoad traverses the specified path, distributes a batch load job of files to job master and distributes a new job if one job completed, that is a intercurrent process. Signed-off-by: liuhongtong <[email protected]> Fix Alluxio#9791 pr-link: Alluxio#9895 change-id: cid-f7ece37fda7dd3bd2a6c38783b77edc410e22fab
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Alluxio Version:
2.0.0
Describe the bug
I have contrasted the performance of load and distributedLoad with 5TiB data. The size of file is 600MiB. The cluster has 30 nodes
It takes 14 hours and 5 minutes.
It takes 9 hours and 10 minutes.
There is no enormous improvement for
distributedLoad
.If this is a bug or any optimized configuration is available?
To Reproduce
refer the section above
Expected behavior
distributedLoad can support high performance.
Urgency
N/A
The text was updated successfully, but these errors were encountered: