Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using spark4-preview versions #2159

Merged
merged 5 commits into from
Oct 22, 2024
Merged

Conversation

mathbunnyru
Copy link
Member

Describe your changes

Issue ticket if applicable

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@mathbunnyru
Copy link
Member Author

I checked the build logs, and the expected version of spark is installed:

make build/pyspark-notebook

 => => # INFO:__main__:Latest version: 4.0.0-preview2
 => => # INFO:__main__:Downloading and unpacking Spark
 => => # INFO:__main__:Spark directory name: spark-4.0.0-preview2-bin-hadoop3

Then I checked the image itself, and the directories are named properly:

docker run -it --rm quay.io/jupyter/pyspark-notebook bash

(base) jovyan@b6f0e3c1463d:~$ ll /usr/local | grep spark
lrwxrwxrwx  1 root root   43 Oct 20 12:51 spark -> /usr/local/spark-4.0.0-preview2-bin-hadoop3/
drwxr-xr-x 14 root root 4096 Sep 16 04:02 spark-4.0.0-preview2-bin-hadoop3/

And finally, the image tag:

make hook/pyspark-notebook

INFO:__main__:Calculated tag, tagger_name: SparkVersionTagger tag_value: spark-4.0.0-preview2

docker image ls | grep aarch64-spark
quay.io/jupyter/pyspark-notebook   aarch64-spark-4.0.0-preview2   116a1ce7d803   2 minutes ago   4.51GB

So, everything works exactly as expected

@@ -63,7 +63,7 @@ USER ${NB_UID}
RUN mamba install --yes \
'grpcio-status' \
'grpcio' \
'pandas=2.0.3' \
'pandas=2.2.2' \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with 2.2.2, becuase preview2 supports this version

@@ -1,5 +1,13 @@
# Changelog

## 2024-10-22
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will describe the release schedule in this PR: #2072

@@ -36,7 +36,7 @@ def get_latest_spark_version() -> str:
stable_versions = [
ref.removeprefix("spark-").removesuffix("/")
for ref in all_refs
if ref.startswith("spark-") and "incubating" not in ref and "preview" not in ref
if ref.startswith("spark-") and "incubating" not in ref
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line we will have to revert to start using stable versions again (that's why I made a separate commit improving spark setup scripts)

)
warnings = TrackedContainer.get_warnings(logs)
assert len(warnings) == 1
assert "Using incubator modules: jdk.incubator.vector" in warnings[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might disappear when we switch to JDK 21, but that's a separate story and won't be a part of the switch to Python 3.12

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Oct 20, 2024

Unfortunately, sparklyr doesn't seem to support spark v4 yet.

When I run spark_available_versions(), it only gives me versions up to 3.5.

I created an upstream issue: sparklyr/sparklyr#3468

@mathbunnyru mathbunnyru merged commit b744182 into jupyter:main Oct 22, 2024
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant