Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this package work with spark local ? #24

Open
AbdealiLoKo opened this issue Mar 4, 2024 · 2 comments
Open

Does this package work with spark local ? #24

AbdealiLoKo opened this issue Mar 4, 2024 · 2 comments

Comments

@AbdealiLoKo
Copy link

AbdealiLoKo commented Mar 4, 2024

I see documentation about spark on yarn.

Does this also work with spark local mode ? I sometimes use Spark Local for small jobs and I would rather keep my environments consistent with small or large jobs ...

Some documentation would be useful - if I try copying the same stuff from the yarn documentation - it does not seem to be picking up the venvpack environment

@lek18
Copy link

lek18 commented May 1, 2024

I am able to get it working locally. But for yarn documentation i am not able to make it work. I tried:

  1. conda pack
  2. venv -pack with and without poetry
gcloud dataproc jobs submit pyspark "gs://hello_world.py" \
--project wmt-bfdms-dvhorizprod \
--cluster=ipi-cluster-prod \
--region=us-east4 \
--archives 'gs://env/environment.tar.gz#environment' \
--properties="spark.submit.deployMode=cluster,\
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python,\
spark.appMasterEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python"

and i am getting ./environment/bin/python not found

@kbzowski
Copy link

This is because of symlinks. In the archive you have symlink to local python executable. And probably on spark cluster it is located somewhere else ans symlink is invalid. You can change it with --python-prefix, but at the end it produces very strange path. I was not able to force it to point to correct one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants