Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

Closed
wants to merge 13 commits into from

Conversation

JoshRosen
Copy link
Contributor

Now that we have two different internal row formats, UnsafeRow and the old Java-object-based row format, we end up having to perform conversions between these two formats. These conversions should not be performed by the operators themselves; instead, the planner should be responsible for inserting appropriate format conversions when they are needed.

This patch makes the following changes:

  • Add two new physical operators for performing row format conversions, ConvertToUnsafe and ConvertFromUnsafe.
  • Add new methods to SparkPlan to allow operators to express whether they output UnsafeRows and whether they can handle safe or unsafe rows as inputs.
  • Implement an EnsureRowFormats rule to automatically insert converter operators where necessary.

@JoshRosen
Copy link
Contributor Author

/cc @mambrus for review.

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37671 has finished for PR 7482 at commit 3b11ce3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
    • case class ConvertFromUnsafe(child: SparkPlan) extends UnaryNode

@JoshRosen
Copy link
Contributor Author

Ah, I guess I should add a test to show that the Filter case that I described will work properly.

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37669 has finished for PR 7482 at commit ae2195a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
    • case class ConvertFromUnsafe(child: SparkPlan) extends UnaryNode

@@ -306,6 +306,8 @@ case class UnsafeExternalSort(
override def output: Seq[Attribute] = child.output

override def outputOrdering: Seq[SortOrder] = sortOrder

override def outputsUnsafeRows: Boolean = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, meant to change this. Thanks for reminding me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add an assertion to our set operations that the inputs are the same type of row?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or actually, maybe we can add a general assertion to the execute method of SparkPlan?

assert(children.map(_.outputsUnsafeRows).distinct <= 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's add it to execute I think. I'll do this shortly.

@marmbrus
Copy link
Contributor

LGTM pending filter fix and assertion addition!

@JoshRosen
Copy link
Contributor Author

Alright, I've updated this to address your review feedback and also added a lot more assertions and test cases.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37681 has finished for PR 7482 at commit 5220cce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
    • case class ConvertToSafe(child: SparkPlan) extends UnaryNode

if (operator.children.map(_.outputsUnsafeRows).toSet.size != 1) {
// If this operator's children produce both unsafe and safe rows, then convert everything
// to safe rows
operator.withNewChildren {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make more sense to convert to unsafe instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. I think that choosing to resolve this type of conflict in favor of UnsafeRow should be fine: if unsafe operators are disabled via a feature flag, then the plan shouldn't contain any operators which claim to output unsafe rows so this branch will never be triggered.

I'll update this patch to change this logic.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37703 has finished for PR 7482 at commit 7450fa5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
    • case class ConvertToSafe(child: SparkPlan) extends UnaryNode

@rxin
Copy link
Contributor

rxin commented Jul 18, 2015

Thanks - merging this in.

@asfgit asfgit closed this in b8aec6c Jul 18, 2015
@JoshRosen JoshRosen deleted the unsafe-converter-planning branch August 29, 2016 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants