Integer overflow in some case #142

Novemser · 2017-11-15T07:27:17Z

SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt

Throws:

Caused by: com.pingcap.tikv.exception.TiClientInternalException: Error reading region
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:148)
  at com.pingcap.tikv.operation.SelectIterator.hasNext(SelectIterator.java:161)
  at org.apache.spark.sql.tispark.TiRDD$$anon$2.hasNext(TiRDD.scala:75)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:145)
  ... 13 more
Caused by: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:192)
  at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:185)
  at com.pingcap.tikv.operation.SelectIterator.createClientAndSendReq(SelectIterator.java:130)
  at com.pingcap.tikv.operation.SelectIterator.lambda$submitTasks$2(SelectIterator.java:113)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more

Seems there's an overflow issue here.

Note that if we remove * 16 in the sql, the above exception won't be thrown.

The text was updated successfully, but these errors were encountered:

Novemser · 2017-11-15T11:38:35Z

Spark plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter ((isnotnull(id_dt#0L) && (id_dt#0L > (tp_bigint#8L * 16))) && isnotnull(tp_bigint#8L))

tp_bigint#8L * 16 may definitely cause an overflow issue, but we did't validate this filter, pushed it down to TiKV and caused the above problem.

Novemser · 2017-11-15T12:16:18Z

I think spark plan generated here may not be appropriate, a CheckOverflow might have been added to the above filter like the following plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter (((cast(id_dt#0L as decimal(24,2)) > CheckOverflow((cast(cast(tp_bigint#8L as decimal(20,0)) as decimal(22,2)) * 2.22), DecimalType(24,2))) && isnotnull(id_dt#0L)) && isnotnull(tp_bigint#8L))

Related SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on (A.id_dt > B.id_dt * 12.6) where A.tp_bigint = B.id_dt order by A.id_dt

birdstorm · 2017-11-15T12:58:31Z

tispark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#26L]
+- *Sort [id_int#0L ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0L ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#26L, id_int#0L]
         +- *SortMergeJoin [id_bigint#1L], [id_int#26L], Inner, (id_int#0L > (id_int#26L * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int], [id_bigint]
 Filter: Not(IsNull([id_int])), Not(IsNull([id_bigint])), ([id_int] > ([id_bigint] Multiply 2))
}
            +- *Sort [id_int#26L ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(id_int#26L, 200)
                  +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int]
 Filter: Not(IsNull([id_int]))
}

spark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#50]
+- *Sort [id_int#0 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0 ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#50, id_int#0]
         +- *SortMergeJoin [id_bigint#1L], [cast(id_int#50 as bigint)], Inner, (id_int#0 > (id_int#50 * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#0,id_bigint#1L] PushedFilters: [*IsNotNull(id_int), *IsNotNull(id_bigint)], ReadSchema: struct<id_int:int,id_bigint:bigint>
            +- *Sort [cast(id_int#50 as bigint) ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(cast(id_int#50 as bigint), 200)
                  +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#50] PushedFilters: [*IsNotNull(id_int)], ReadSchema: struct<id_int:int>

we missed cast(id_int#50 as bigint) inside SortMergeJoin, not CheckOverflow(). @Novemser

ilovesoup · 2017-11-15T14:47:56Z

Push it back to spark might solve the problem. Or promote it to larger type and push. But likely this implicit conversion is not supported in TiKV old interface. Anyway, we need a check before push, and fallback if not valid predicates. We have talked through it this afternoon. @birdstorm

ilovesoup · 2017-11-21T17:57:59Z

Need to fix after DAG interface.

Novemser · 2017-12-01T04:31:02Z

Another case:

select A.id_dt,A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt, B.id_dt

Exception:

Caused by: com.pingcap.tikv.exception.SelectException: unknown error Overflow
	at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:266)

Novemser · 2017-12-08T08:08:05Z

This issue is caused by bigint overflow from TiKV computation stage. To prevent this from happening, we could let bigint calculation remains in Spark and don't push it down to TiKV.

However, same issue occurs in TiDB and MySQL:
SQL:

select tp_int from full_data_type_table where tp_bigint * 20 > 0

TiDB:

ERROR 1105 (HY000): other error: unknown error Overflow

MySQL:

ERROR 1690 (22003): BIGINT value is out of range in '(`tispark_test`.`full_data_type_table`.`tp_bigint` * 20)'

It seems that both of them don't have a fallback path to handle this scenario.

But in Spark with JDBC, operation on potential overflow calculation cases will not be pushed down.
Like this:

== Physical Plan ==
*Project [tp_int#84]
+- *Filter ((tp_bigint#80L * 20) > 0)
   +- *Scan JDBCRelation(tispark_test.full_data_type_table) [numPartitions=1] [tp_int#84,tp_bigint#80L] PushedFilters: [*IsNotNull(tp_bigint)], ReadSchema: struct<tp_int:int>

So here's the question, should we make our behavior consistent with TiDB/MySQL or Spark with JDBC? 🤥

Novemser changed the title ~~Error comparing integer with bigint in some case~~ Integer overflow in some case Nov 15, 2017

ilovesoup added the P1 label Nov 21, 2017

ilovesoup added the cur_sprint label Dec 8, 2017

Novemser self-assigned this Dec 8, 2017

ilovesoup removed the cur_sprint label Dec 12, 2017

birdstorm mentioned this issue Dec 29, 2017

Integer overflow in some case pingcap/tispark#177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer overflow in some case #142

Integer overflow in some case #142

Novemser commented Nov 15, 2017

Novemser commented Nov 15, 2017

Novemser commented Nov 15, 2017

birdstorm commented Nov 15, 2017 •

edited

Loading

ilovesoup commented Nov 15, 2017

ilovesoup commented Nov 21, 2017

Novemser commented Dec 1, 2017

Novemser commented Dec 8, 2017 •

edited

Loading

Integer overflow in some case #142

Integer overflow in some case #142

Comments

Novemser commented Nov 15, 2017

Novemser commented Nov 15, 2017

Novemser commented Nov 15, 2017

birdstorm commented Nov 15, 2017 • edited Loading

ilovesoup commented Nov 15, 2017

ilovesoup commented Nov 21, 2017

Novemser commented Dec 1, 2017

Novemser commented Dec 8, 2017 • edited Loading

birdstorm commented Nov 15, 2017 •

edited

Loading

Novemser commented Dec 8, 2017 •

edited

Loading