Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Spark image using the binary spark distribution tars #33

Open
Dandandan opened this issue Sep 24, 2021 · 0 comments
Open

Create a Spark image using the binary spark distribution tars #33

Dandandan opened this issue Sep 24, 2021 · 0 comments

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Sep 24, 2021

Currently the spark distribution / hadoop libs in the image is installed using conda / pip which has a few implications.

  • Because pip is being used some parts of the distribution are being left out (such as a start-thriftserver.sh script)
  • The location of the distribution is a weird one, as it's within the conda directory (/opt/miniconda3/lib/python3.8/site-packages/pyspark)

Other findings:

  • variables like SPARK_HOME aren't set
  • Root user is being used
  • Could be using a multi-stage build to reduce image size and to avoid uninstalling dependencies in the Dockerfile

Might also be an idea to use a spark base image, like https://github.com/bitnami/bitnami-docker-spark which improves on all of these points

@Dandandan Dandandan changed the title Create a spark image using the binary spark distribution tars Create a Spark image using the binary spark distribution tars Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant