-
Notifications
You must be signed in to change notification settings - Fork 21
Integer overflow in some case #142
Comments
Spark plan: :- Project [id_dt#0L, tp_bigint#8L]
: +- Filter ((isnotnull(id_dt#0L) && (id_dt#0L > (tp_bigint#8L * 16))) && isnotnull(tp_bigint#8L))
|
I think spark plan generated here may not be appropriate, a :- Project [id_dt#0L, tp_bigint#8L]
: +- Filter (((cast(id_dt#0L as decimal(24,2)) > CheckOverflow((cast(cast(tp_bigint#8L as decimal(20,0)) as decimal(22,2)) * 2.22), DecimalType(24,2))) && isnotnull(id_dt#0L)) && isnotnull(tp_bigint#8L)) Related SQL: select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on (A.id_dt > B.id_dt * 12.6) where A.tp_bigint = B.id_dt order by A.id_dt |
tispark:
spark:
we missed |
Push it back to spark might solve the problem. Or promote it to larger type and push. But likely this implicit conversion is not supported in TiKV old interface. Anyway, we need a check before push, and fallback if not valid predicates. We have talked through it this afternoon. @birdstorm |
Need to fix after DAG interface. |
Another case: select A.id_dt,A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt, B.id_dt Exception: Caused by: com.pingcap.tikv.exception.SelectException: unknown error Overflow
at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:266) |
This issue is caused by bigint overflow from TiKV computation stage. To prevent this from happening, we could let bigint calculation remains in Spark and don't push it down to TiKV. However, same issue occurs in TiDB and MySQL: select tp_int from full_data_type_table where tp_bigint * 20 > 0 TiDB:
MySQL:
It seems that both of them don't have a fallback path to handle this scenario. But in Spark with JDBC, operation on potential overflow calculation cases will not be pushed down. == Physical Plan ==
*Project [tp_int#84]
+- *Filter ((tp_bigint#80L * 20) > 0)
+- *Scan JDBCRelation(tispark_test.full_data_type_table) [numPartitions=1] [tp_int#84,tp_bigint#80L] PushedFilters: [*IsNotNull(tp_bigint)], ReadSchema: struct<tp_int:int> So here's the question, should we make our behavior consistent with TiDB/MySQL or Spark with JDBC? 🤥 |
SQL:
Throws:
Seems there's an overflow issue here.
Note that if we remove
* 16
in the sql, the above exception won't be thrown.The text was updated successfully, but these errors were encountered: