Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change default spark image mechanics #194

Merged
merged 4 commits into from
Feb 18, 2019

Conversation

elmiko
Copy link
Collaborator

@elmiko elmiko commented Feb 15, 2019

Description

This change makes it possible to set the default spark image through an
environment variable in the operator container. It also changes the
default image specified inside the code to
quay.io/radanalyticsio/openshift-spark:latest and removes the default
image ref from the schema spec.

Related Issue

#186

Types of changes

  • Updated docs / Refactor code / Added a tests case / Automation (non-breaking change)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

This change makes it possible to set the default spark image through an
environment variable in the operator container. It also changes the
default image specified inside the code to
`quay.io/radanalyticsio/openshift-spark:latest` and removes the default
image ref from the schema spec.
@elmiko elmiko requested a review from jkremser as a code owner February 15, 2019 20:18
@elmiko
Copy link
Collaborator Author

elmiko commented Feb 15, 2019

i think there might need to be some cleanup on the tests, and i'm not sure where to document these changes. any suggestions @Jiri-Kremser ?

@elmiko
Copy link
Collaborator Author

elmiko commented Feb 15, 2019

this also does not affect SparkApplications yet

@@ -2,8 +2,16 @@

public class Constants {

public static String DEFAULT_SPARK_IMAGE = "quay.io/jkremser/openshift-spark:2.4.0";
public static String DEFAULT_SPARK_IMAGE = "quay.io/radanalyticsio/openshift-spark:latest";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on using radanalyticsio instead of jkremser, actually it looks better. The reason I switched back to jkremser org. was the image size, the images under radanalyticsio org are 250 megs larger :( The reason is the dogen tool that generates suboptimal (and ugly) Dockerfiles that create unnecessary layers during the auto-build. Solution would be using docker-squash somewhere in the process, or using better dockerfile.

λ docker images --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}' | grep openshift-spark
quay.io/jkremser/openshift-spark             2.4.0               653 MB
docker.io/jkremser/openshift-spark           2.4.0               653 MB
quay.io/radanalyticsio/openshift-spark       latest              900 MB
docker.io/radanalyticsio/openshift-spark     latest              900 MB
quay.io/jkremser/openshift-spark             2.3-latest          650 MB
docker.io/radanalyticsio/openshift-spark     2.2.0-4             656 MB

I am against using :latest and also this would be step backward because afaik quay.io/radanalyticsio/openshift-spark:latest represents spark 2.3. See, we have no idea what the image actually is :)

Let's use some explicit version here. I suggest using the quay.io/jkremser/openshift-spark:2.4.0 image and tag it as quay.io/radanalyticsio/openshift-spark:2.4.0. Drawback is that there isn't any dockerfile for this image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. the default spark (2.4.0) image that is used in gcp operator is 348 megs. 900 vs 348 is just insane. They use alpine though, with some os like centos, with python 2.7 installed and some other libs they would be close to 650 megs.

Btw. I was looking into our images and found this:
screenshot-20190218135755-1920x1061

it's probably this http://math-atlas.sourceforge.net/ ATLAS. I doubt it's needed for pure Spark to run. Does it ring a bell with you? It also may be some transitive dependency, my guess is that it was needed for some jupyter notebook use-case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general i agree with your reasoning @Jiri-Kremser, let me explain my choices a little.

i added that specific radanalytics image because we are transitioning to using quay as the upstream for that stuff. but, the downside is that we only have 2 tags there currently. i think the best option would be to go back to using the docker hub image for now and then we can pick 2.3-latest as the tag while we wait for more stuff to show up in quay.

as for the image size, i think we need to push the issue for making the openshift-spark images smaller. if your personal image is that much small then i think we should get your changes into the upstream repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR does two things:

  • change default spark image mechanics
  • change the "default for the default image mechanics"

Can we do only the first thing here in the PR and address the second one later. Because it's not only simple "innocent" s/quay.io/jkremser/openshift-spark:2.4.0/quay.io/radanalyticsio/openshift-spark:latest/g. It also silently downgrades the default spark image from 2.4 to 2.3 and makes the images almost 50% larger.

Ideally, we will be using images from radanalyticsio that are small and up to date.

i think we need to push the issue for making the openshift-spark images smaller

Agreed, as for the image size, it's not that hard. We can let travis do the image push after running docker-squash tool on them, instead of running the docker auto builds, wdyt?
Something similar as in here: https://github.com/radanalyticsio/spark-operator/blob/master/.travis/.travis.release.images.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm good with this change, i will revert the default image.

i think we really need to get your changes into the upstream though, not only for the upgrade to 2.4.0 but also the reduced image size. i just think it would be better to be grabbing images from the community org in the registry, i guess that's a hang up for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ totally agree

@@ -36,6 +37,11 @@
private RunningClusters clusters;
private KubernetesSparkClusterDeployer deployer;

@Override
protected void onInit() {
log.info("{} operator default spark image = {}", this.entityName, Constants.getDefaultSparkImage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -83,7 +83,7 @@ public void testParseYaml1() {

assertEquals(clusterInfo.getName(), "foo");
assertEquals(clusterInfo.getWorker().getInstances().intValue(), 2);
assertEquals(clusterInfo.getCustomImage(), DEFAULT_SPARK_IMAGE);
assertEquals(clusterInfo.getCustomImage(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pls also remove the unused import (DEFAULT_SPARK_IMAGE) in this class?

Copy link
Member

@jkremser jkremser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌰 lgtm

@jkremser jkremser merged commit 2e67dac into radanalyticsio:master Feb 18, 2019
@elmiko elmiko deleted the issue/186 branch March 6, 2019 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants