Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallelism of als and movie-lens to 4 threads per executor #147

Merged
merged 2 commits into from
Jun 15, 2019

Conversation

farquet
Copy link
Collaborator

@farquet farquet commented Jun 7, 2019

Fixes #145

Copy link
Collaborator

@Fithos Fithos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@axel22
Copy link
Member

axel22 commented Jun 7, 2019

I'm not sure we should change the parallelism of all the benchmarks (we are a parallelism suite after all).
For als and movie-lens, I agree, since there is contention there.

For the rest of them, we should first have a meeting to discuss this, and we should investigate the scalability of all the benchmarks further.

Copy link
Member

@axel22 axel22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only change als and movie-lens now, and investigate the other benchmarks separately.

@lbulej
Copy link
Member

lbulej commented Jun 7, 2019

I think we should only change als and movie-lens now, and investigate the other benchmarks separately.

I support Alex's view. Such a change should be based on some evidence that everyone can review. I think it should be discussed in a separate issue that we can refer to, and we should keep a separate changelog file in the project that concerns such changes in benchmarks.

Edit: And yes, I'm aware of #145, but I would first expect a resolution there.

@farquet
Copy link
Collaborator Author

farquet commented Jun 7, 2019

I'm happy to get more precise numbers for all Spark benchmarks, but independently of what the actual numbers are, it is a bug to set the number of threads to something so high and does not represent what an actual Spark workload looks like.

Indeed, a Spark app writer would scale the app by increasing the number of physical nodes and/or the number of executors per node, but won't go beyond 5 for the number of threads per executor because it doesn't give you more throughput and creates contention [1]. So what we are doing here is clearly wrong.

The solution proposed in that PR still executes a single executor and removes this bug. I would argue against the fact that using all cores as spark threads makes it a parallel benchmark, it only makes it a buggy benchmark since there is just a single executor anyway.

Another reasonable approach that would really make those benchmarks parallel using all the available cores would be to increase the number of executors (spark.executor.instances) and give each executors 2 to 4 threads.

Basically, the best way to maximize resource usage and throughput on a single Spark node would be to do something like : NUM_THREADS * NUM_EXECUTORS = NUM_CORES - 1.

NUM_THREADS could be different from a benchmark to another depending on how much data movement there is. So this could be fine tuned per benchmark and NUM_EXECUTORS would simply be computed based on NUM_THREADS and NUM_CORES. Maybe a max value could also be set.

I gave a quick try of that approach and I can get 9% better throughput for page-rank for instance if I fix 4 threads and 2 executors on my 6 cores macbook. However, movie-lens becomes much jumpier, probably because of data movement and maybe more GC. Nevertheless, some iterations give a better score than the best score achieved with the old setup.

Anyway, I don't want to dive too deep into this analysis now, but I think it is the way to go. My suggestion here would be to :

  1. Merge that PR as a bugfix to make the suite usable and correct on big machines
  2. Do proper parallelism of Spark benchmarks in a subsequent PR. That would imply a deeper analysis of each better to find the good tradeoffs for each workload.

[1] https://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors

@farquet farquet changed the title Fix parallelism of Apache Spark benchmarks to 4 threads per executor Fix parallelism of als and movie-lens to 4 threads per executor Jun 14, 2019
@farquet
Copy link
Collaborator Author

farquet commented Jun 14, 2019

As discussed, I updated the PR to only touch als and movie-lens parallelism.

Copy link
Member

@axel22 axel22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@farquet farquet merged commit fc658f4 into master Jun 15, 2019
@farquet farquet deleted the fix-apache-parallelism branch June 15, 2019 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contention on Spark benchmarks on machines with lots of CPUs
4 participants