-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change default spark image mechanics #194
Conversation
This change makes it possible to set the default spark image through an environment variable in the operator container. It also changes the default image specified inside the code to `quay.io/radanalyticsio/openshift-spark:latest` and removes the default image ref from the schema spec.
i think there might need to be some cleanup on the tests, and i'm not sure where to document these changes. any suggestions @Jiri-Kremser ? |
this also does not affect SparkApplications yet |
@@ -2,8 +2,16 @@ | |||
|
|||
public class Constants { | |||
|
|||
public static String DEFAULT_SPARK_IMAGE = "quay.io/jkremser/openshift-spark:2.4.0"; | |||
public static String DEFAULT_SPARK_IMAGE = "quay.io/radanalyticsio/openshift-spark:latest"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on using radanalyticsio
instead of jkremser
, actually it looks better. The reason I switched back to jkremser
org. was the image size, the images under radanalyticsio org are 250 megs larger :( The reason is the dogen tool that generates suboptimal (and ugly) Dockerfiles that create unnecessary layers during the auto-build. Solution would be using docker-squash somewhere in the process, or using better dockerfile.
λ docker images --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}' | grep openshift-spark
quay.io/jkremser/openshift-spark 2.4.0 653 MB
docker.io/jkremser/openshift-spark 2.4.0 653 MB
quay.io/radanalyticsio/openshift-spark latest 900 MB
docker.io/radanalyticsio/openshift-spark latest 900 MB
quay.io/jkremser/openshift-spark 2.3-latest 650 MB
docker.io/radanalyticsio/openshift-spark 2.2.0-4 656 MB
I am against using :latest
and also this would be step backward because afaik quay.io/radanalyticsio/openshift-spark:latest
represents spark 2.3. See, we have no idea what the image actually is :)
Let's use some explicit version here. I suggest using the quay.io/jkremser/openshift-spark:2.4.0
image and tag it as quay.io/radanalyticsio/openshift-spark:2.4.0
. Drawback is that there isn't any dockerfile for this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. the default spark (2.4.0
) image that is used in gcp operator is 348 megs
. 900 vs 348 is just insane. They use alpine though, with some os like centos, with python 2.7 installed and some other libs they would be close to 650 megs.
Btw. I was looking into our images and found this:
it's probably this http://math-atlas.sourceforge.net/ ATLAS. I doubt it's needed for pure Spark to run. Does it ring a bell with you? It also may be some transitive dependency, my guess is that it was needed for some jupyter notebook use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general i agree with your reasoning @Jiri-Kremser, let me explain my choices a little.
i added that specific radanalytics image because we are transitioning to using quay as the upstream for that stuff. but, the downside is that we only have 2 tags there currently. i think the best option would be to go back to using the docker hub image for now and then we can pick 2.3-latest
as the tag while we wait for more stuff to show up in quay.
as for the image size, i think we need to push the issue for making the openshift-spark images smaller. if your personal image is that much small then i think we should get your changes into the upstream repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR does two things:
- change default spark image mechanics
- change the "default for the default image mechanics"
Can we do only the first thing here in the PR and address the second one later. Because it's not only simple "innocent" s/quay.io/jkremser/openshift-spark:2.4.0/quay.io/radanalyticsio/openshift-spark:latest/g
. It also silently downgrades the default spark image from 2.4 to 2.3 and makes the images almost 50% larger.
Ideally, we will be using images from radanalyticsio
that are small and up to date.
i think we need to push the issue for making the openshift-spark images smaller
Agreed, as for the image size, it's not that hard. We can let travis do the image push after running docker-squash
tool on them, instead of running the docker auto builds, wdyt?
Something similar as in here: https://github.com/radanalyticsio/spark-operator/blob/master/.travis/.travis.release.images.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm good with this change, i will revert the default image.
i think we really need to get your changes into the upstream though, not only for the upgrade to 2.4.0 but also the reduced image size. i just think it would be better to be grabbing images from the community org in the registry, i guess that's a hang up for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ totally agree
@@ -36,6 +37,11 @@ | |||
private RunningClusters clusters; | |||
private KubernetesSparkClusterDeployer deployer; | |||
|
|||
@Override | |||
protected void onInit() { | |||
log.info("{} operator default spark image = {}", this.entityName, Constants.getDefaultSparkImage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -83,7 +83,7 @@ public void testParseYaml1() { | |||
|
|||
assertEquals(clusterInfo.getName(), "foo"); | |||
assertEquals(clusterInfo.getWorker().getInstances().intValue(), 2); | |||
assertEquals(clusterInfo.getCustomImage(), DEFAULT_SPARK_IMAGE); | |||
assertEquals(clusterInfo.getCustomImage(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pls also remove the unused import (DEFAULT_SPARK_IMAGE
) in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌰 lgtm
Description
This change makes it possible to set the default spark image through an
environment variable in the operator container. It also changes the
default image specified inside the code to
quay.io/radanalyticsio/openshift-spark:latest
and removes the defaultimage ref from the schema spec.
Related Issue
#186
Types of changes