-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197
Conversation
## What changes were proposed in this pull request? Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests. ## How was this patch tested? Existing tests Author: Eric Liang <[email protected]> Closes apache#21058 from ericl/clear-session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
ok to test |
test this please |
Retest this please. |
Test build #89983 has finished for PR 21197 at commit
|
Retest this please. |
ok to test |
test this please |
Ur, currently, SparkR test failed in many PR consistently. I can see the following Error message and SparkR test seems to run twice according to the log. Could you take a look please, @shivaram and @felixcheung ?
|
Test build #90016 has finished for PR 21197 at commit
|
retest this please |
#21197 (comment) This looks pretty much similar with the past one. Probably we should ask some help to R dev again if it consistently fails. Looks fine now though. |
Test build #90025 has finished for PR 21197 at commit
|
retest this please |
It seems this patch doesn't fix the problem... the test still hang |
Is this the error? Seems like intermittent problem from CRAN. Let me know if you see this again.
Also its just the log text repeated, but the test run.
checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) :
dims [product 24] do not match the length of object [0]
Execution halted
|
Thank you for confirming, @felixcheung ! |
@felixcheung . The same failure occurs again in #21210 , too.
|
I filed this flakiness as SPARK-24152. |
Test build #90065 has finished for PR 21197 at commit
|
Thanks! Merged to 2.3 |
…t suites This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches: https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/ https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/ --- ## What changes were proposed in this pull request? Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests. ## How was this patch tested? Existing tests Author: Eric Liang <[email protected]> Closes #21197 from gatorsmile/backportSPARK-23971.
1) Use host/domain from dfsCluster 2) Use semaphores to remove timing based flakeyness 3) Ensure spark context is closed 4) Ignore ParquetAvroCompatibilitySuite temporarily 5) Adaptation of SPARK-28247 to 2.3 6) Disable codegen tests - these are fixed in 2.4 and 3.x, but require sql backports to 2.3 7) PR apache#21197 to handle test failures due to leaking spark session 8) PR#20926: Set default Spark session in test-only spark sessions 9) PR apache#20971: Active SparkSession should be set by getOrCreate 10) PR apache#21446: Random.nextString is not safe for directory namePrefix 11) Disabling QueryStageSuite#'adaptive skewed join' 12) SPARK-24318: Fix flakey SortShuffleSuite 13) Fix ForeachSinkSuite test failure 14) Diabling StreamingQuerySuite.'status, lastProgress, and recentProgress' 15) Disabling ResolvedDataSourceSuite.'avro: show deploy guide for loading the external avro module' 16) PR# 23405: Avoid to use Random.nextString in StreamingInnerJoinSuite 17) Adaptation of PR#25849, add LinkedIn repo as default to spark.sql.maven.additionalRemoteRepositories 18) Disabling HiveExternalCatalogVersionsSuite - only 2.4.6 and 3.0.0 are available from apache mirrors 19) Disabling INFER_AND_SAVE test in HiveSchemaInferenceSuite - Address Erik's review comments
This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches:
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/
What changes were proposed in this pull request?
Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests.
How was this patch tested?
Existing tests