You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Qual tool should recommend spark.executor.cores based on best TCO value from internal benchmark instead of inheriting CPU run.
For example in an extreme situation, if CPU run's spark.executor.cores=1 , currently Qual tool will recommend GPU run uses spark.executor.cores=1 as well which does not seem right from TCO point of view.
Imagine if CPU run is using 160 x spark.executor.cores=1, should GPU run also use: 160 x (spark.executor.cores=1, 1 GPU) -- Qual tool recommendation
Or 10 x (spark.executor.cores=16, 1 GPU)? -- the proposal here
Describe the solution you'd like
My proposal is:
For on-prem cluster at least, we set the recommended GPU run’s spark.executor.cores to 16 or 8 instead of inheriting the CPU’s value.
Of course, the 8 or 16 or whatever value should come from internal benchmark results to prove this setting can achieve best TCO.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Qual tool should recommend
spark.executor.cores
based on best TCO value from internal benchmark instead of inheriting CPU run.For example in an extreme situation, if CPU run's
spark.executor.cores=1
, currently Qual tool will recommend GPU run usesspark.executor.cores=1
as well which does not seem right from TCO point of view.Imagine if CPU run is using
160 x spark.executor.cores=1
, should GPU run also use:160 x (spark.executor.cores=1, 1 GPU)
-- Qual tool recommendationOr
10 x (spark.executor.cores=16, 1 GPU)
? -- the proposal hereDescribe the solution you'd like
My proposal is:
For on-prem cluster at least, we set the recommended GPU run’s
spark.executor.cores
to16
or8
instead of inheriting the CPU’s value.Of course, the 8 or 16 or whatever value should come from internal benchmark results to prove this setting can achieve best TCO.
The text was updated successfully, but these errors were encountered: