Support save WithCTE for insertRepartitionBeforeWrite #6783
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🔍 Description
Issue References 🔗
First, I'd like to thank @wForget for the help with this issue.
When using the "save to HDFS" feature, queries ending with an
ORDER BY
sometimes lose their sort order in the results. Upon investigating the code, I discovered that when usingWITH
statements and saving SQL results withtoDF.write.save
, aWithCTE
node is generated after the Sort node. This causes thecanInsertRepartitionByExpression
check to fail, leading to an incorrectRepartition
node insertion after the Sort node, which ultimately disrupts the sort order.However, this issue does not occur when using
INSERT INTO TABLE
withWithCTE
nodes.The provided unit test can reproduce this issue, but after using
toDF.write.save
, I am unable to access the complete execution plan to assert whether aRepartition
node is present. Therefore, the current test is ineffective.Hope someone can help figure out how to write this unit test.
Describe Your Solution 🔧
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Related Unit Tests
Checklist 📝
Be nice. Be informative.