-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23809][SQL][backport] Active SparkSession should be set by getOrCreate #20971
Conversation
## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup.
@gatorsmile here's the patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending Jenkins
Test build #88865 has finished for PR 20971 at commit
|
Test build #88867 has finished for PR 20971 at commit
|
retest this please |
Test build #88870 has finished for PR 20971 at commit
|
retest this please |
1 similar comment
retest this please |
ok to test |
Test build #88961 has finished for PR 20971 at commit
|
Thanks! Merged to 2.3 |
…OrCreate This backports #20927 to branch-2.3 ## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if #20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes #20971 from ericl/backport-23809.
…OrCreate This backports apache#20927 to branch-2.3 Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20971 from ericl/backport-23809. Change-Id: Iee11470899bc67d0ad9f096d37f8705174593f3e
1) Use host/domain from dfsCluster 2) Use semaphores to remove timing based flakeyness 3) Ensure spark context is closed 4) Ignore ParquetAvroCompatibilitySuite temporarily 5) Adaptation of SPARK-28247 to 2.3 6) Disable codegen tests - these are fixed in 2.4 and 3.x, but require sql backports to 2.3 7) PR apache#21197 to handle test failures due to leaking spark session 8) PR#20926: Set default Spark session in test-only spark sessions 9) PR apache#20971: Active SparkSession should be set by getOrCreate 10) PR apache#21446: Random.nextString is not safe for directory namePrefix 11) Disabling QueryStageSuite#'adaptive skewed join' 12) SPARK-24318: Fix flakey SortShuffleSuite 13) Fix ForeachSinkSuite test failure 14) Diabling StreamingQuerySuite.'status, lastProgress, and recentProgress' 15) Disabling ResolvedDataSourceSuite.'avro: show deploy guide for loading the external avro module' 16) PR# 23405: Avoid to use Random.nextString in StreamingInnerJoinSuite 17) Adaptation of PR#25849, add LinkedIn repo as default to spark.sql.maven.additionalRemoteRepositories 18) Disabling HiveExternalCatalogVersionsSuite - only 2.4.6 and 3.0.0 are available from apache mirrors 19) Disabling INFER_AND_SAVE test in HiveSchemaInferenceSuite - Address Erik's review comments
Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup. (cherry picked from commit 359375e) NOTE: This cherry-pick includes only some of the original changes, because LIHADOOP-54684 already made some identical changes as part of test resolution. This now ensures the entirety of the original SPARK-23809 code is backported. Also note that when SPARK-23809 was originally backported to branch-2.3 in PR apache#20971, the new `active` API was _not_ included since new APIs shouldn't generally be added in patch releases. That new API is exactly what is needed, so we backport directly from the original commit. RB=2575134 BUG=BDP-6088 G=spark-reviewers R=mmuralid,wyzhang,smahadik A=mmuralid,wyzhang
This backports #20927 to branch-2.3
What changes were proposed in this pull request?
Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.
The semantics here can be cleaned up if we also set the active session when the default session is set.
Related: https://github.com/apache/spark/pull/20926/files
How was this patch tested?
Unit test, existing test. Note that if #20926 merges first we should also update the tests there.