-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update proportion of memory #66
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
the default value of "spark.storage.memoryFraction" has been change from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.
Can one of the admins verify this patch? |
Thanks. I've merged this. |
wli600
pushed a commit
to wli600/spark
that referenced
this pull request
Jul 29, 2015
SKIPME merging Apache branch-1.4 bug fixes
JasonMWhite
pushed a commit
to JasonMWhite/spark
that referenced
this pull request
Dec 2, 2015
add spark streaming requirements to pomfile
marcosdotps
pushed a commit
to marcosdotps/spark
that referenced
this pull request
Sep 21, 2017
* revert change hosts * Update Jenkinsfile
cenyuhai
added a commit
to cenyuhai/spark
that referenced
this pull request
Oct 8, 2017
[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM Refine SlidingWindowFunctionFrame to avoid OOM resolve apache#66 See merge request !59
ashangit
added a commit
to ashangit/spark
that referenced
this pull request
Feb 22, 2018
[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors
jamesrgrinter
pushed a commit
to jamesrgrinter/spark
that referenced
this pull request
Apr 22, 2018
Signed-off-by: Rostyslav Sotnychenko <[email protected]> (cherry picked from commit de237dc)
Igosuki
pushed a commit
to Adikteev/spark
that referenced
this pull request
Jul 31, 2018
[SPARK-300] fix docs for HDFS
clems4ever
pushed a commit
to clems4ever/spark
that referenced
this pull request
Feb 11, 2019
[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors
bzhaoopenstack
pushed a commit
to bzhaoopenstack/spark
that referenced
this pull request
Sep 11, 2019
UT: dims/openstack-cloud-controller-manager + devstack [in vm]
yuexingri
pushed a commit
to yuexingri/spark
that referenced
this pull request
Dec 9, 2019
apache#39 performance issue in fuction getAliasedConstraints of LogicalPlan
arjunshroff
pushed a commit
to arjunshroff/spark
that referenced
this pull request
Nov 24, 2020
cloud-fan
pushed a commit
that referenced
this pull request
Jan 14, 2021
…join can be planned as broadcast join ### What changes were proposed in this pull request? Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases. ```scala spark.range(50000000L).selectExpr("id % 10000 as a", "id % 10000 as b").write.saveAsTable("t1") spark.range(40000000L).selectExpr("id % 8000 as c", "id % 8000 as d").write.saveAsTable("t2") spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM t2").explain ``` Before this pr: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#72] +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#65] : +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#66] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#61] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint> ``` After this pr: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#74] +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#67] : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) : +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#61] : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) : +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#68] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#63] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint> ``` ### Why are the changes needed? 1. Pushdown LeftSemi/LeftAnti over Aggregate will affect performance. 2. It will remove user added DISTINCT operator, e.g.: [q38](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q38.sql), [q87](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q87.sql). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test and benchmark test. SQL | Before this PR(Seconds) | After this PR(Seconds) -- | -- | -- q14a | 660 | 594 q14b | 660 | 600 q38 | 55 | 29 q87 | 66 | 35 Before this pr: ![image](https://user-images.githubusercontent.com/5399861/104452849-8789fc80-55de-11eb-88da-44059899f9a9.png) After this pr: ![image](https://user-images.githubusercontent.com/5399861/104452899-9a043600-55de-11eb-9286-d8f3a23ca3b8.png) Closes #31145 from wangyum/SPARK-34081. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
dongjoon-hyun
pushed a commit
that referenced
this pull request
Sep 12, 2024
…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query #66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
attilapiros
pushed a commit
to attilapiros/spark
that referenced
this pull request
Oct 4, 2024
…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query apache#11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query apache#66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
himadripal
pushed a commit
to himadripal/spark
that referenced
this pull request
Oct 19, 2024
…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query apache#11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query apache#66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The default value of "spark.storage.memoryFraction" has been changed from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.