-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parallelism of als and movie-lens to 4 threads per executor #147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm not sure we should change the parallelism of all the benchmarks (we are a parallelism suite after all). For the rest of them, we should first have a meeting to discuss this, and we should investigate the scalability of all the benchmarks further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only change als and movie-lens now, and investigate the other benchmarks separately.
I support Alex's view. Such a change should be based on some evidence that everyone can review. I think it should be discussed in a separate issue that we can refer to, and we should keep a separate changelog file in the project that concerns such changes in benchmarks. Edit: And yes, I'm aware of #145, but I would first expect a resolution there. |
I'm happy to get more precise numbers for all Spark benchmarks, but independently of what the actual numbers are, it is a bug to set the number of threads to something so high and does not represent what an actual Spark workload looks like. Indeed, a Spark app writer would scale the app by increasing the number of physical nodes and/or the number of executors per node, but won't go beyond 5 for the number of threads per executor because it doesn't give you more throughput and creates contention [1]. So what we are doing here is clearly wrong. The solution proposed in that PR still executes a single executor and removes this bug. I would argue against the fact that using all cores as spark threads makes it a parallel benchmark, it only makes it a buggy benchmark since there is just a single executor anyway. Another reasonable approach that would really make those benchmarks parallel using all the available cores would be to increase the number of executors (spark.executor.instances) and give each executors 2 to 4 threads. Basically, the best way to maximize resource usage and throughput on a single Spark node would be to do something like : NUM_THREADS could be different from a benchmark to another depending on how much data movement there is. So this could be fine tuned per benchmark and NUM_EXECUTORS would simply be computed based on NUM_THREADS and NUM_CORES. Maybe a max value could also be set. I gave a quick try of that approach and I can get 9% better throughput for page-rank for instance if I fix 4 threads and 2 executors on my 6 cores macbook. However, movie-lens becomes much jumpier, probably because of data movement and maybe more GC. Nevertheless, some iterations give a better score than the best score achieved with the old setup. Anyway, I don't want to dive too deep into this analysis now, but I think it is the way to go. My suggestion here would be to :
|
As discussed, I updated the PR to only touch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #145