-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include avro test by using '--packages' option [skip ci] #6505
Conversation
Signed-off-by: zhanga5 <[email protected]>
build |
@@ -284,6 +300,7 @@ export -f get_tests_by_tags | |||
# - DEFAULT: all tests except cudf_udf tests | |||
# - CUDF_UDF_ONLY: cudf_udf tests only, requires extra conda cudf-py lib | |||
# - ICEBERG_ONLY: iceberg tests only | |||
# - AVRO_ONLY: avro tests only (with --packages option instead of --jars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add documentation to README about this option and mention using --packages since if you don't have internet access it won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which README? Didn't find any info about existing options DELTA_LAKE_ONLY
or ICEBERG_ONLY
, which used similar --packages
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like enabling avro test was quite different with iceberg or delta lake. Not sure whether should provide some consistency and then document it in same manner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think this should be a new issue to bring avro test consistent w/ other ones. filed an issue to track this #6532
jenkins/spark-tests.sh
Outdated
@@ -211,6 +211,18 @@ run_iceberg_tests() { | |||
fi | |||
} | |||
|
|||
# Test spark-avro with documented way of deploying at run time via --packages option from Maven | |||
run_avro_tests() { | |||
unset TEST_PARALLEL # Enable auto spark local parallel in run_pyspark_from_build.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK_SUBMIT_FLAGS
and TEST_PARALLEL > 0 are not compatible, see the thread on #6403
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not require it must be parallel in this PR as we are testing --package avro
.
For this place to let run_pyspark_from_build to set the parallelism may not always be a good idea as it try to set as many as parallelism as possible, sometimes this would actually decrease the performance if over some specific value and could potentially caused host machine memory OOM with different GPU types. I guess we could hardcode it to 4 or 5 here
We can discuss about how to better parallelize tests like iceberg and avro in #6403. But for this PR lets keep it simple~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. we need to drop unset TEST_PARALLEL
in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Signed-off-by: zhanga5 <[email protected]>
jenkins/spark-tests.sh
Outdated
@@ -211,6 +211,18 @@ run_iceberg_tests() { | |||
fi | |||
} | |||
|
|||
# Test spark-avro with documented way of deploying at run time via --packages option from Maven | |||
run_avro_tests() { | |||
unset TEST_PARALLEL # Enable auto spark local parallel in run_pyspark_from_build.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. we need to drop unset TEST_PARALLEL
in this PR
export PYSP_TEST_spark_jars_packages="org.apache.spark:spark-avro_2.12:${SPARK_VER}" | ||
|
||
# Workaround to avoid appending avro jar file by '--jars' | ||
rm -vf $LOCAL_JAR_PATH/spark-avro*.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move out of the way (e.g. rename spark-avro.jar
to spark-avro.jar%
) before calling the avro tests and restore after to avoid side effects and creating dependence on the order of the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets take care of all avro_test inconsistency in #6532
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will upmerge this and update a few later in #6522
build |
jenkins/spark-tests.sh
Outdated
|
||
# Workaround to avoid appending avro jar file by '--jars' | ||
AVRO_JAR_FILE=`cd $LOCAL_JAR_PATH && ls spark-avro*.jar` | ||
mv $LOCAL_JAR_PATH/$AVRO_JAR_FILE $LOCAL_JAR_PATH/$AVRO_JAR_FILE% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that could cause issue is that mv could fail if no avro file in LOCAL_JAR_PATH.
may simply || true
to avoid error out the case that people mark ENV INCLUDE_SPARK_AVRO_JAR=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted to previous simple workaround. It would be fine as avro test would be executed at the end of scripts and avro jar file will be auto downloaded each time
build |
Closes #5657