-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482
Conversation
/cc @mambrus for review. |
Test build #37671 has finished for PR 7482 at commit
|
Ah, I guess I should add a test to show that the |
Test build #37669 has finished for PR 7482 at commit
|
@@ -306,6 +306,8 @@ case class UnsafeExternalSort( | |||
override def output: Seq[Attribute] = child.output | |||
|
|||
override def outputOrdering: Seq[SortOrder] = sortOrder | |||
|
|||
override def outputsUnsafeRows: Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, meant to change this. Thanks for reminding me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add an assertion to our set operations that the inputs are the same type of row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or actually, maybe we can add a general assertion to the execute
method of SparkPlan
?
assert(children.map(_.outputsUnsafeRows).distinct <= 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's add it to execute
I think. I'll do this shortly.
LGTM pending filter fix and assertion addition! |
Alright, I've updated this to address your review feedback and also added a lot more assertions and test cases. |
Test build #37681 has finished for PR 7482 at commit
|
if (operator.children.map(_.outputsUnsafeRows).toSet.size != 1) { | ||
// If this operator's children produce both unsafe and safe rows, then convert everything | ||
// to safe rows | ||
operator.withNewChildren { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make more sense to convert to unsafe instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. I think that choosing to resolve this type of conflict in favor of UnsafeRow should be fine: if unsafe operators are disabled via a feature flag, then the plan shouldn't contain any operators which claim to output unsafe rows so this branch will never be triggered.
I'll update this patch to change this logic.
Test build #37703 has finished for PR 7482 at commit
|
Thanks - merging this in. |
Now that we have two different internal row formats, UnsafeRow and the old Java-object-based row format, we end up having to perform conversions between these two formats. These conversions should not be performed by the operators themselves; instead, the planner should be responsible for inserting appropriate format conversions when they are needed.
This patch makes the following changes:
ConvertToUnsafe
andConvertFromUnsafe
.SparkPlan
to allow operators to express whether they output UnsafeRows and whether they can handle safe or unsafe rows as inputs.EnsureRowFormats
rule to automatically insert converter operators where necessary.