[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

JoshRosen · 2015-07-17T21:36:54Z

Now that we have two different internal row formats, UnsafeRow and the old Java-object-based row format, we end up having to perform conversions between these two formats. These conversions should not be performed by the operators themselves; instead, the planner should be responsible for inserting appropriate format conversions when they are needed.

This patch makes the following changes:

Add two new physical operators for performing row format conversions, ConvertToUnsafe and ConvertFromUnsafe.
Add new methods to SparkPlan to allow operators to express whether they output UnsafeRows and whether they can handle safe or unsafe rows as inputs.
Implement an EnsureRowFormats rule to automatically insert converter operators where necessary.

…anning

JoshRosen · 2015-07-17T21:37:00Z

/cc @mambrus for review.

SparkQA · 2015-07-17T23:03:37Z

Test build #37671 has finished for PR 7482 at commit 3b11ce3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
- case class ConvertFromUnsafe(child: SparkPlan) extends UnaryNode

JoshRosen · 2015-07-17T23:03:50Z

Ah, I guess I should add a test to show that the Filter case that I described will work properly.

SparkQA · 2015-07-17T23:10:10Z

Test build #37669 has finished for PR 7482 at commit ae2195a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
- case class ConvertFromUnsafe(child: SparkPlan) extends UnaryNode

marmbrus · 2015-07-17T23:11:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

@@ -306,6 +306,8 @@ case class UnsafeExternalSort(
  override def output: Seq[Attribute] = child.output

  override def outputOrdering: Seq[SortOrder] = sortOrder
+
+  override def outputsUnsafeRows: Boolean = true


What about filter?

Yep, meant to change this. Thanks for reminding me.

Should we also add an assertion to our set operations that the inputs are the same type of row?

or actually, maybe we can add a general assertion to the execute method of SparkPlan?

assert(children.map(_.outputsUnsafeRows).distinct <= 1)

Yeah, let's add it to execute I think. I'll do this shortly.

marmbrus · 2015-07-17T23:23:06Z

LGTM pending filter fix and assertion addition!

JoshRosen · 2015-07-18T01:37:19Z

Alright, I've updated this to address your review feedback and also added a lot more assertions and test cases.

SparkQA · 2015-07-18T03:10:53Z

Test build #37681 has finished for PR 7482 at commit 5220cce.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
- case class ConvertToSafe(child: SparkPlan) extends UnaryNode

rxin · 2015-07-18T07:55:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala

+      if (operator.children.map(_.outputsUnsafeRows).toSet.size != 1) {
+        // If this operator's children produce both unsafe and safe rows, then convert everything
+        // to safe rows
+        operator.withNewChildren {


wouldn't it make more sense to convert to unsafe instead?

Yeah, I think so. I think that choosing to resolve this type of conflict in favor of UnsafeRow should be fine: if unsafe operators are disabled via a feature flag, then the plan shouldn't contain any operators which claim to output unsafe rows so this branch will never be triggered.

I'll update this patch to change this logic.

SparkQA · 2015-07-18T09:46:38Z

Test build #37703 has finished for PR 7482 at commit 7450fa5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode
- case class ConvertToSafe(child: SparkPlan) extends UnaryNode

rxin · 2015-07-18T18:07:55Z

Thanks - merging this in.

JoshRosen added 5 commits July 17, 2015 08:56

WIP

9ba3038

Merge remote-tracking branch 'origin/master' into unsafe-converter-pl…

b5df19b

…anning

Finish writing EnsureRowFormats planner rule

d5f9005

Rename file.

0fef0f8

Fixes

ae2195a

Add missing test file.

3b11ce3

JoshRosen mentioned this pull request Jul 17, 2015

[SPARK-9023] [SQL] Efficiency improvements for UnsafeRows in Exchange #7456

Closed

marmbrus reviewed Jul 17, 2015
View reviewed changes

JoshRosen added 6 commits July 17, 2015 17:23

Add tests for Filter

cabb703

Add assertion if operators' input rows are in different formats

0e2d548

Rename ConvertFromUnsafe -> ConvertToSafe

08ce199

Add even more assertions to execute()

6f79449

Add Union unsafe support + tests to bump up test coverage

2bb8da8

Add roundtrip converter test

5220cce

rxin reviewed Jul 18, 2015
View reviewed changes

Resolve conflicts in favor of choosing UnsafeRow

7450fa5

asfgit closed this in b8aec6c Jul 18, 2015

JoshRosen deleted the unsafe-converter-planning branch August 29, 2016 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

JoshRosen commented Jul 17, 2015

JoshRosen commented Jul 17, 2015

SparkQA commented Jul 17, 2015

JoshRosen commented Jul 17, 2015

SparkQA commented Jul 17, 2015

marmbrus Jul 17, 2015

JoshRosen Jul 17, 2015

marmbrus Jul 17, 2015

marmbrus Jul 17, 2015

JoshRosen Jul 17, 2015

marmbrus commented Jul 17, 2015

JoshRosen commented Jul 18, 2015

SparkQA commented Jul 18, 2015

rxin Jul 18, 2015

JoshRosen Jul 18, 2015

SparkQA commented Jul 18, 2015

rxin commented Jul 18, 2015

[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

[SPARK-9143] [SQL] Add planner rule for automatically inserting Unsafe <-> Safe row format converters #7482

Conversation

JoshRosen commented Jul 17, 2015

JoshRosen commented Jul 17, 2015

SparkQA commented Jul 17, 2015

JoshRosen commented Jul 17, 2015

SparkQA commented Jul 17, 2015

marmbrus Jul 17, 2015

Choose a reason for hiding this comment

JoshRosen Jul 17, 2015

Choose a reason for hiding this comment

marmbrus Jul 17, 2015

Choose a reason for hiding this comment

marmbrus Jul 17, 2015

Choose a reason for hiding this comment

JoshRosen Jul 17, 2015

Choose a reason for hiding this comment

marmbrus commented Jul 17, 2015

JoshRosen commented Jul 18, 2015

SparkQA commented Jul 18, 2015

rxin Jul 18, 2015

Choose a reason for hiding this comment

JoshRosen Jul 18, 2015

Choose a reason for hiding this comment

SparkQA commented Jul 18, 2015

rxin commented Jul 18, 2015