-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399
Conversation
Test build #59614 has finished for PR 13399 at commit
|
@cloud-fan , would you please help to review this patch, not sure it is the right fix. Thanks a lot. |
@@ -783,6 +783,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext { | |||
ds.filter(_.b > 1).collect().toSeq | |||
} | |||
} | |||
|
|||
test("transformed dataset correctly solve the attributes") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you double check this test? I tried this on latest master branch and it passed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @cloud-fan for your review. I think the issue still exists, problem is that unit test actually doesn't reflect this issue, let me change the unit test.
You could try the below to see the issue.
val dataset = Seq(1, 2, 3).toDS
val mappedDataset = dataset.map(_ + 1)
mappedDataset.as("t1").joinWith(mappedDataset.as("t2"), $"t1.value" === $"t2.value").show()
Unit test updated, please review again, thanks a lot :) |
Test build #59727 has finished for PR 13399 at commit
|
|
||
test("transformed dataset correctly solve the attributes") { | ||
val dataset = Seq(1, 2, 3).toDS() | ||
val ds1 = dataset.map(_ + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd like it to be:
val ds = Seq(1, 2, 3).toDS().map(_ + 1)
val d1 = ds.as("d1")
val d2 = ds.as("d2")
...
and the test name can be: mapped dataset should resolve duplicated attributes for self join and set operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me update it.
LGTM except a minor style comment, thanks for working on it! |
Test build #59802 has finished for PR 13399 at commit
|
## What changes were proposed in this pull request? Join on transformed dataset has attributes conflicts, which make query execution failure, for example: ``` val dataset = Seq(1, 2, 3).toDs val mappedDs = dataset.map(_ + 1) mappedDs.as("t1").joinWith(mappedDs.as("t2"), $"t1.value" === $"t2.value").show() ``` will throw exception: ``` org.apache.spark.sql.AnalysisException: cannot resolve '`t1.value`' given input columns: [value]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287) ``` ## How was this patch tested? Unit test. Author: jerryshao <[email protected]> Closes #13399 from jerryshao/SPARK-15620. (cherry picked from commit 8288e16) Signed-off-by: Wenchen Fan <[email protected]>
thanks, merging to master and 2.0! |
### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Join on transformed dataset has attributes conflicts, which make query execution failure, for example:
will throw exception:
How was this patch tested?
Unit test.