Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

numPartitions are not the same #329

Closed
rui-mo opened this issue May 19, 2021 · 5 comments · Fixed by #335
Closed

numPartitions are not the same #329

rui-mo opened this issue May 19, 2021 · 5 comments · Fixed by #335
Labels
bug Something isn't working

Comments

@rui-mo
Copy link
Collaborator

rui-mo commented May 19, 2021

We found this error on v2 in q94 and q95 with AQE on:

java.lang.IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
  at scala.Predef$.require(Predef.scala:281)
  at org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.<init>(partitioning.scala:307)
  at org.apache.spark.sql.execution.joins.ShuffledJoin.outputPartitioning(ShuffledJoin.scala:35)
  at org.apache.spark.sql.execution.joins.ShuffledJoin.outputPartitioning$(ShuffledJoin.scala:33)
  at com.intel.oap.execution.ColumnarBroadcastHashJoinExec.outputPartitioning(ColumnarBroadcastHashJoinExec.scala:48)
  at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.outputPartitioning$lzycompute(CustomShuffleReaderExec.scala:61)
  at org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.outputPartitioning(CustomShuffleReaderExec.scala:51)
  at org.apache.spark.sql.execution.exchange.EnsureRequirements$
@rui-mo
Copy link
Collaborator Author

rui-mo commented May 19, 2021

This error can also be found in below ut:

  test("Columnar exchange reuse") {
    withSQLConf(
      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
      SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "80") {
      val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(
        "SELECT value FROM testData join testData2 ON key = a " +
          "join (SELECT value v from testData join testData3 ON key = a) on value = v")
      val ex = findReusedExchange(adaptivePlan)
      assert(ex.size == 1)
    }
  }

@rui-mo
Copy link
Collaborator Author

rui-mo commented May 20, 2021

@rui-mo
Copy link
Collaborator Author

rui-mo commented May 20, 2021

In spark:


/**
 * A collection of [[Partitioning]]s that can be used to describe the partitioning
 * scheme of the output of a physical operator. It is usually used for an operator
 * that has multiple children. In this case, a [[Partitioning]] in this collection
 * describes how this operator's output is partitioned based on expressions from
 * a child. For example, for a Join operator on two tables `A` and `B`
 * with a join condition `A.key1 = B.key2`, assuming we use HashPartitioning schema,
 * there are two [[Partitioning]]s can be used to describe how the output of
 * this Join operator is partitioned, which are `HashPartitioning(A.key1)` and
 * `HashPartitioning(B.key2)`. It is also worth noting that `partitionings`
 * in this collection do not need to be equivalent, which is useful for
 * Outer Join operators.
 */
case class PartitioningCollection(partitionings: Seq[Partitioning])
  extends Expression with Partitioning with Unevaluable {

  require(
    partitionings.map(_.numPartitions).distinct.length == 1,
    s"PartitioningCollection requires all of its partitionings have the same numPartitions.")

@rui-mo rui-mo changed the title error relating to numPartitions numPartitions are not the same May 20, 2021
@zhouyuan
Copy link
Collaborator

this issue is already fixed?
apache/spark#20041

@rui-mo
Copy link
Collaborator Author

rui-mo commented May 20, 2021

yes, but this check still exists in spark code. Is it possible we triggered it for similar reason?

case class PartitioningCollection(partitionings: Seq[Partitioning])
  extends Expression with Partitioning with Unevaluable {

  require(
    partitionings.map(_.numPartitions).distinct.length == 1,
    s"PartitioningCollection requires all of its partitionings have the same numPartitions.")

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants