Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23971] [BACKPORT-2.3] Should not leak Spark sessions across test suites #21197

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches:

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/


What changes were proposed in this pull request?

Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests.

How was this patch tested?

Existing tests

## What changes were proposed in this pull request?

Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests.

## How was this patch tested?

Existing tests

Author: Eric Liang <[email protected]>

Closes apache#21058 from ericl/clear-session.
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@gatorsmile
Copy link
Member Author

ok to test

@gatorsmile
Copy link
Member Author

test this please

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented May 1, 2018

Test build #89983 has finished for PR 21197 at commit c1c6377.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Retest this please.

@gatorsmile
Copy link
Member Author

ok to test

@gatorsmile
Copy link
Member Author

gatorsmile commented May 1, 2018

test this please

@dongjoon-hyun
Copy link
Member

Ur, currently, SparkR test failed in many PR consistently. I can see the following Error message and SparkR test seems to run twice according to the log. Could you take a look please, @shivaram and @felixcheung ?

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90016 has finished for PR 21197 at commit c1c6377.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 2, 2018

retest this please

@HyukjinKwon
Copy link
Member

#21197 (comment) This looks pretty much similar with the past one. Probably we should ask some help to R dev again if it consistently fails. Looks fine now though.

@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90025 has finished for PR 21197 at commit c1c6377.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

It seems this patch doesn't fix the problem... the test still hang

@felixcheung
Copy link
Member

felixcheung commented May 2, 2018 via email

@dongjoon-hyun
Copy link
Member

Thank you for confirming, @felixcheung !

@dongjoon-hyun
Copy link
Member

@felixcheung . The same failure occurs again in #21210 , too.

* this is package 'SparkR' version '2.4.0'
* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

@dongjoon-hyun
Copy link
Member

I filed this flakiness as SPARK-24152.

@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90065 has finished for PR 21197 at commit c1c6377.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merged to 2.3

asfgit pushed a commit that referenced this pull request May 2, 2018
…t suites

This PR is to backport the PR #21058 to Apache 2.3. This should be the cause why we saw the test regressions in Apache 2.3 branches:

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/317/testReport/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/SPARK_15678__not_use_cache_on_overwrite/history/

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.7/318/testReport/junit/org.apache.spark.sql/DataFrameSuite/inputFiles/history/

---

## What changes were proposed in this pull request?

Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the thread-local active Spark session and default Spark session. We should attempt to clean these up and detect when this happens to improve the reproducibility of tests.

## How was this patch tested?

Existing tests

Author: Eric Liang <[email protected]>

Closes #21197 from gatorsmile/backportSPARK-23971.
@gatorsmile gatorsmile closed this May 2, 2018
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
1) Use host/domain from dfsCluster
2) Use semaphores to remove timing based flakeyness
3) Ensure spark context is closed
4) Ignore ParquetAvroCompatibilitySuite temporarily
5) Adaptation of SPARK-28247 to 2.3
6) Disable codegen tests - these are fixed in 2.4 and 3.x, but require sql backports to 2.3
7) PR apache#21197 to handle test failures due to leaking spark session
8) PR#20926: Set default Spark session in test-only spark sessions
9) PR apache#20971: Active SparkSession should be set by getOrCreate
10) PR apache#21446: Random.nextString is not safe for directory namePrefix
11) Disabling QueryStageSuite#'adaptive skewed join'
12) SPARK-24318: Fix flakey SortShuffleSuite
13) Fix ForeachSinkSuite test failure
14) Diabling StreamingQuerySuite.'status, lastProgress, and recentProgress'
15) Disabling ResolvedDataSourceSuite.'avro: show deploy guide for loading the external avro module'
16) PR# 23405: Avoid to use Random.nextString in StreamingInnerJoinSuite
17) Adaptation of PR#25849, add LinkedIn repo as default to spark.sql.maven.additionalRemoteRepositories
18) Disabling HiveExternalCatalogVersionsSuite - only 2.4.6 and 3.0.0 are available from apache mirrors
19) Disabling INFER_AND_SAVE test in HiveSchemaInferenceSuite

- Address Erik's review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants