Fix parallelism of als and movie-lens to 4 threads per executor #147

farquet · 2019-06-07T10:26:15Z

Fixes #145

Fithos

LGTM!

axel22 · 2019-06-07T11:35:40Z

I'm not sure we should change the parallelism of all the benchmarks (we are a parallelism suite after all).
For als and movie-lens, I agree, since there is contention there.

For the rest of them, we should first have a meeting to discuss this, and we should investigate the scalability of all the benchmarks further.

axel22

I think we should only change als and movie-lens now, and investigate the other benchmarks separately.

lbulej · 2019-06-07T12:03:47Z

I think we should only change als and movie-lens now, and investigate the other benchmarks separately.

I support Alex's view. Such a change should be based on some evidence that everyone can review. I think it should be discussed in a separate issue that we can refer to, and we should keep a separate changelog file in the project that concerns such changes in benchmarks.

Edit: And yes, I'm aware of #145, but I would first expect a resolution there.

farquet · 2019-06-07T14:58:40Z

I'm happy to get more precise numbers for all Spark benchmarks, but independently of what the actual numbers are, it is a bug to set the number of threads to something so high and does not represent what an actual Spark workload looks like.

Indeed, a Spark app writer would scale the app by increasing the number of physical nodes and/or the number of executors per node, but won't go beyond 5 for the number of threads per executor because it doesn't give you more throughput and creates contention [1]. So what we are doing here is clearly wrong.

The solution proposed in that PR still executes a single executor and removes this bug. I would argue against the fact that using all cores as spark threads makes it a parallel benchmark, it only makes it a buggy benchmark since there is just a single executor anyway.

Another reasonable approach that would really make those benchmarks parallel using all the available cores would be to increase the number of executors (spark.executor.instances) and give each executors 2 to 4 threads.

Basically, the best way to maximize resource usage and throughput on a single Spark node would be to do something like : NUM_THREADS * NUM_EXECUTORS = NUM_CORES - 1.

NUM_THREADS could be different from a benchmark to another depending on how much data movement there is. So this could be fine tuned per benchmark and NUM_EXECUTORS would simply be computed based on NUM_THREADS and NUM_CORES. Maybe a max value could also be set.

I gave a quick try of that approach and I can get 9% better throughput for page-rank for instance if I fix 4 threads and 2 executors on my 6 cores macbook. However, movie-lens becomes much jumpier, probably because of data movement and maybe more GC. Nevertheless, some iterations give a better score than the best score achieved with the old setup.

Anyway, I don't want to dive too deep into this analysis now, but I think it is the way to go. My suggestion here would be to :

Merge that PR as a bugfix to make the suite usable and correct on big machines
Do proper parallelism of Spark benchmarks in a subsequent PR. That would imply a deeper analysis of each better to find the good tradeoffs for each workload.

[1] https://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors

farquet · 2019-06-14T17:03:45Z

As discussed, I updated the PR to only touch als and movie-lens parallelism.

axel22

LGTM

Fix parallelism of Apache Spark benchmarks to 4 threads per executor #…

76e6f25

…145

farquet requested review from vhotspur, axel22, ceresek, lbulej and Fithos June 7, 2019 10:26

Fithos approved these changes Jun 7, 2019

View reviewed changes

axel22 requested changes Jun 7, 2019

View reviewed changes

Revert to only modify parallelism of als and movie-lens

eba17e4

farquet changed the title ~~Fix parallelism of Apache Spark benchmarks to 4 threads per executor~~ Fix parallelism of als and movie-lens to 4 threads per executor Jun 14, 2019

axel22 approved these changes Jun 14, 2019

View reviewed changes

farquet merged commit fc658f4 into master Jun 15, 2019

farquet deleted the fix-apache-parallelism branch June 15, 2019 07:57

lbulej mentioned this pull request Jun 7, 2021

Spark worker scaling parameters #260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parallelism of als and movie-lens to 4 threads per executor #147

Fix parallelism of als and movie-lens to 4 threads per executor #147

farquet commented Jun 7, 2019

Fithos left a comment

axel22 commented Jun 7, 2019 •

edited

Loading

axel22 left a comment

lbulej commented Jun 7, 2019 •

edited

Loading

farquet commented Jun 7, 2019 •

edited

Loading

farquet commented Jun 14, 2019

axel22 left a comment

Fix parallelism of als and movie-lens to 4 threads per executor #147

Fix parallelism of als and movie-lens to 4 threads per executor #147

Conversation

farquet commented Jun 7, 2019

Fithos left a comment

Choose a reason for hiding this comment

axel22 commented Jun 7, 2019 • edited Loading

axel22 left a comment

Choose a reason for hiding this comment

lbulej commented Jun 7, 2019 • edited Loading

farquet commented Jun 7, 2019 • edited Loading

farquet commented Jun 14, 2019

axel22 left a comment

Choose a reason for hiding this comment

axel22 commented Jun 7, 2019 •

edited

Loading

lbulej commented Jun 7, 2019 •

edited

Loading

farquet commented Jun 7, 2019 •

edited

Loading