[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

gatorsmile · 2018-04-30T19:46:38Z

This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches:

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/

What changes were proposed in this pull request?

Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests.

How was this patch tested?

Existing tests

## What changes were proposed in this pull request? Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests. ## How was this patch tested? Existing tests Author: Eric Liang <[email protected]> Closes apache#21058 from ericl/clear-session.

dongjoon-hyun

+1, LGTM.

gatorsmile · 2018-04-30T23:49:48Z

ok to test

gatorsmile · 2018-04-30T23:49:53Z

test this please

dongjoon-hyun · 2018-05-01T16:28:57Z

Retest this please.

SparkQA · 2018-05-01T19:49:49Z

Test build #89983 has finished for PR 21197 at commit c1c6377.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-05-01T20:49:00Z

Retest this please.

gatorsmile · 2018-05-01T22:46:01Z

ok to test

gatorsmile · 2018-05-01T22:46:07Z

test this please

dongjoon-hyun · 2018-05-02T00:17:23Z

Ur, currently, SparkR test failed in many PR consistently. I can see the following Error message and SparkR test seems to run twice according to the log. Could you take a look please, @shivaram and @felixcheung ?

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

SparkQA · 2018-05-02T01:14:16Z

Test build #90016 has finished for PR 21197 at commit c1c6377.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-05-02T02:38:17Z

retest this please

HyukjinKwon · 2018-05-02T02:50:16Z

#21197 (comment) This looks pretty much similar with the past one. Probably we should ask some help to R dev again if it consistently fails. Looks fine now though.

SparkQA · 2018-05-02T06:53:02Z

Test build #90025 has finished for PR 21197 at commit c1c6377.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-05-02T15:28:15Z

retest this please

cloud-fan · 2018-05-02T15:29:07Z

It seems this patch doesn't fix the problem... the test still hang

felixcheung · 2018-05-02T15:49:11Z

Is this the error? Seems like intermittent problem from CRAN. Let me know if you see this again. Also its just the log text repeated, but the test run. checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted

dongjoon-hyun · 2018-05-02T17:06:33Z

Thank you for confirming, @felixcheung !

dongjoon-hyun · 2018-05-02T17:11:18Z

@felixcheung . The same failure occurs again in #21210 , too.

* this is package 'SparkR' version '2.4.0'
* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

dongjoon-hyun · 2018-05-02T17:15:59Z

I filed this flakiness as SPARK-24152.

SparkQA · 2018-05-02T18:54:12Z

Test build #90065 has finished for PR 21197 at commit c1c6377.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-05-02T19:01:37Z

Thanks! Merged to 2.3

…t suites This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches: https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/ https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/ --- ## What changes were proposed in this pull request? Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests. ## How was this patch tested? Existing tests Author: Eric Liang <[email protected]> Closes #21197 from gatorsmile/backportSPARK-23971.

1) Use host/domain from dfsCluster 2) Use semaphores to remove timing based flakeyness 3) Ensure spark context is closed 4) Ignore ParquetAvroCompatibilitySuite temporarily 5) Adaptation of SPARK-28247 to 2.3 6) Disable codegen tests - these are fixed in 2.4 and 3.x, but require sql backports to 2.3 7) PR apache#21197 to handle test failures due to leaking spark session 8) PR#20926: Set default Spark session in test-only spark sessions 9) PR apache#20971: Active SparkSession should be set by getOrCreate 10) PR apache#21446: Random.nextString is not safe for directory namePrefix 11) Disabling QueryStageSuite#'adaptive skewed join' 12) SPARK-24318: Fix flakey SortShuffleSuite 13) Fix ForeachSinkSuite test failure 14) Diabling StreamingQuerySuite.'status, lastProgress, and recentProgress' 15) Disabling ResolvedDataSourceSuite.'avro: show deploy guide for loading the external avro module' 16) PR# 23405: Avoid to use Random.nextString in StreamingInnerJoinSuite 17) Adaptation of PR#25849, add LinkedIn repo as default to spark.sql.maven.additionalRemoteRepositories 18) Disabling HiveExternalCatalogVersionsSuite - only 2.4.6 and 3.0.0 are available from apache mirrors 19) Disabling INFER_AND_SAVE test in HiveSchemaInferenceSuite - Address Erik's review comments

dongjoon-hyun approved these changes Apr 30, 2018

View reviewed changes

HyukjinKwon approved these changes May 1, 2018

View reviewed changes

gatorsmile closed this May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

gatorsmile commented Apr 30, 2018

dongjoon-hyun left a comment

gatorsmile commented Apr 30, 2018

gatorsmile commented Apr 30, 2018

dongjoon-hyun commented May 1, 2018

SparkQA commented May 1, 2018

gatorsmile commented May 1, 2018

gatorsmile commented May 1, 2018

gatorsmile commented May 1, 2018 •

edited

Loading

dongjoon-hyun commented May 2, 2018

SparkQA commented May 2, 2018

kiszk commented May 2, 2018

HyukjinKwon commented May 2, 2018

SparkQA commented May 2, 2018

cloud-fan commented May 2, 2018

cloud-fan commented May 2, 2018

felixcheung commented May 2, 2018 via email

dongjoon-hyun commented May 2, 2018

dongjoon-hyun commented May 2, 2018

dongjoon-hyun commented May 2, 2018

SparkQA commented May 2, 2018

gatorsmile commented May 2, 2018

[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

Conversation

gatorsmile commented Apr 30, 2018

What changes were proposed in this pull request?

How was this patch tested?

dongjoon-hyun left a comment

Choose a reason for hiding this comment

gatorsmile commented Apr 30, 2018

gatorsmile commented Apr 30, 2018

dongjoon-hyun commented May 1, 2018

SparkQA commented May 1, 2018

gatorsmile commented May 1, 2018

gatorsmile commented May 1, 2018

gatorsmile commented May 1, 2018 • edited Loading

dongjoon-hyun commented May 2, 2018

SparkQA commented May 2, 2018

kiszk commented May 2, 2018

HyukjinKwon commented May 2, 2018

SparkQA commented May 2, 2018

cloud-fan commented May 2, 2018

cloud-fan commented May 2, 2018

felixcheung commented May 2, 2018 via email

dongjoon-hyun commented May 2, 2018

dongjoon-hyun commented May 2, 2018

dongjoon-hyun commented May 2, 2018

SparkQA commented May 2, 2018

gatorsmile commented May 2, 2018

gatorsmile commented May 1, 2018 •

edited

Loading