Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3085] [SQL] Use compact data structures in SQL joins #1993

Closed
wants to merge 2 commits into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented Aug 17, 2014

This reuses the CompactBuffer from Spark Core to save memory and pointer
dereferences. I also tried AppendOnlyMap instead of java.util.HashMap
but unfortunately that slows things down because it seems to do more
equals() calls and the equals on GenericRow, and especially JoinedRow,
is pretty expensive.

This reuses the CompactBuffer from Spark Core to save memory and pointer
dereferences. I also tried AppendOnlyMap instead of java.util.HashMap
but unfortunately that slows things down because it seems to do more
equals() calls and the equals on GenericRow, and especially JoinedRow,
is pretty expensive.
@SparkQA
Copy link

SparkQA commented Aug 17, 2014

QA tests have started for PR 1993 at commit 5f903ee.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 17, 2014

QA tests have finished for PR 1993 at commit 5f903ee.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mateiz
Copy link
Contributor Author

mateiz commented Aug 17, 2014

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Aug 17, 2014

QA tests have started for PR 1993 at commit 5f903ee.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 17, 2014

QA tests have finished for PR 1993 at commit 5f903ee.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.physical._
import org.apache.spark.util.collection.{AppendOnlyMap, CompactBuffer}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't actually using AppendOnlyMap though right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, I'll remove the import

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have started for PR 1993 at commit 188221e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have finished for PR 1993 at commit 188221e.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Looking at the various runs, this patch only failed the thrift server tests consistently, so I'm going to go ahead and merge it. Thanks matei.

asfgit pushed a commit that referenced this pull request Aug 18, 2014
This reuses the CompactBuffer from Spark Core to save memory and pointer
dereferences. I also tried AppendOnlyMap instead of java.util.HashMap
but unfortunately that slows things down because it seems to do more
equals() calls and the equals on GenericRow, and especially JoinedRow,
is pretty expensive.

Author: Matei Zaharia <[email protected]>

Closes #1993 from mateiz/spark-3085 and squashes the following commits:

188221e [Matei Zaharia] Remove unneeded import
5f903ee [Matei Zaharia] [SPARK-3085] [SQL] Use compact data structures in SQL joins

(cherry picked from commit 4bf3de7)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 4bf3de7 Aug 18, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This reuses the CompactBuffer from Spark Core to save memory and pointer
dereferences. I also tried AppendOnlyMap instead of java.util.HashMap
but unfortunately that slows things down because it seems to do more
equals() calls and the equals on GenericRow, and especially JoinedRow,
is pretty expensive.

Author: Matei Zaharia <[email protected]>

Closes apache#1993 from mateiz/spark-3085 and squashes the following commits:

188221e [Matei Zaharia] Remove unneeded import
5f903ee [Matei Zaharia] [SPARK-3085] [SQL] Use compact data structures in SQL joins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants