Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV panicked during query in a transaction #20027

Closed
zyguan opened this issue Sep 16, 2020 · 10 comments
Closed

TiKV panicked during query in a transaction #20027

zyguan opened this issue Sep 16, 2020 · 10 comments
Assignees
Labels
severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@zyguan
Copy link
Contributor

zyguan commented Sep 16, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists t1, t2;
-- init
create table t1 (c_int int, c_str varchar(40), c_datetime datetime, c_timestamp timestamp, primary key (c_int, c_datetime)  , key(c_str), key(c_timestamp)) partition by range (to_days(c_datetime)) ( partition p0 values less than (to_days('2020-02-01')), partition p1 values less than (to_days('2020-04-01')), partition p2 values less than (to_days('2020-06-01')), partition p3 values less than maxvalue) ;
create table t2 like t1 ;
insert into t1 values (1, 'elastic borg', '2020-04-11 01:01:02', '2020-01-13 17:17:47'), (2, 'hopeful swanson', '2020-06-01 21:48:06', '2020-01-14 11:55:42'), (3, 'fervent nightingale', '2020-01-25 05:03:12', '2020-01-15 20:24:00');
insert into t2 values (1, 'agitated chandrasekhar', '2020-03-26 00:38:16', '2020-06-13 07:44:39'), (2, 'optimistic mclaren', '2020-01-17 11:06:35', '2020-03-27 12:53:29'), (3, 'clever nobel', '2020-03-28 10:06:23', '2020-03-26 18:29:45');
-- txn
begin;
delete from t2 where c_int = 2;
select * from t1 where c_timestamp in (select c_timestamp from t2 where t1.c_int = t2.c_int or t1.c_int = t2.c_int and t1.c_str <= t2.c_str); -- ERROR 9002 (HY000): TiKV server timeout
commit;

2. What did you expect to see? (Required)

No error reported.

3. What did you see instead (Required)

mysql> begin;
Query OK, 0 rows affected (0.05 sec)

mysql> delete from t2 where c_int = 2;
Query OK, 1 row affected (0.06 sec)

mysql> select * from t1 where c_timestamp in (select c_timestamp from t2 where t1.c_int = t2.c_int or t1.c_int = t2.c_int and t1.c_str <= t2.c_str) ;
ERROR 9002 (HY000): TiKV server timeout

tikv panicked with

[2020/09/15 12:27:04.318 +00:00] [FATAL] [lib.rs:483] ["Cannot call `as_date_time_slice` over a Int column"] [backtrace="stack backtrace:\n   0: tikv_util::set_panic_hook::{{closure}}\n             at components/tikv_util/src/lib.rs:482\n   1: std::panicking::rust_panic_with_hook\n             at src/libstd/panicking.rs:475\n   2: rust_begin_unwind\n             at src/libstd/panicking.rs:375\n   3: std::panicking::begin_panic_fmt\n             at src/libstd/panicking.rs:326\n   4: <tidb_query::rpn_expr::types::function::ArgConstructor<A,E> as tidb_query::rpn_expr::types::function::Evaluator>::eval\n             at components/tidb_query/src/codec/data_type/vector.rs:0\n      tidb_query::rpn_expr::impl_op::is_null_fn_meta::run\n             at components/tidb_query/src/rpn_expr/impl_op.rs:104\n   5: tidb_query::rpn_expr::types::expr_eval::<impl tidb_query::rpn_expr::types::expr::RpnExpression>::eval_decoded\n             at components/tidb_query/src/rpn_expr/types/expr_eval.rs:257\n   6: tidb_query::batch::executors::selection_executor::BatchSelectionExecutor<Src>::handle_src_result\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/components/tidb_query/src/batch/executors/selection_executor.rs:82\n      <tidb_query::batch::executors::selection_executor::BatchSelectionExecutor<Src> as tidb_query::batch::interface::BatchExecutor>::next_batch\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/components/tidb_query/src/batch/executors/selection_executor.rs:175\n      <tidb_query::execute_stats::WithSummaryCollector<C,T> as tidb_query::batch::interface::BatchExecutor>::next_batch\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/components/tidb_query/src/batch/interface.rs:106\n   7: <alloc::boxed::Box<T> as tidb_query::batch::interface::BatchExecutor>::next_batch\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/components/tidb_query/src/batch/interface.rs:75\n      tidb_query::batch::runner::BatchExecutorsRunner<SS>::handle_request::{{closure}}\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/components/tidb_query/src/batch/runner.rs:362\n      <std::future::GenFuture<T> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:43\n      std::future::poll_with_tls_context\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:99\n      <tikv::coprocessor::dag::BatchDAGHandler as tikv::coprocessor::RequestHandler>::handle_request::__handle_request::{{closure}}\n             at src/coprocessor/dag/mod.rs:171\n      <std::future::GenFuture<T> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:43\n   8: <core::pin::Pin<P> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/future/future.rs:119\n      <tikv::coprocessor::interceptors::tracker::Tracker<F> as core::future::future::Future>::poll\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/coprocessor/interceptors/tracker.rs:45\n      <tikv::coprocessor::interceptors::concurrency_limiter::ConcurrencyLimiter<PF,F> as core::future::future::Future>::poll\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/coprocessor/interceptors/concurrency_limiter.rs:101\n      std::future::poll_with_tls_context\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:99\n      tikv::coprocessor::endpoint::Endpoint<E>::handle_unary_request_impl::{{closure}}\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/coprocessor/endpoint.rs:341\n      <std::future::GenFuture<T> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:43\n      std::future::poll_with_tls_context\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:99\n      tikv::read_pool::ReadPoolHandle::spawn_handle::{{closure}}\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/read_pool.rs:139\n      <std::future::GenFuture<T> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:43\n      std::future::poll_with_tls_context\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:99\n   9: tikv::read_pool::ReadPoolHandle::spawn::{{closure}}\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/read_pool.rs:115\n      <std::future::GenFuture<T> as core::future::future::Future>::poll\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/future.rs:43\n  10: <yatp::task::future::Runner as yatp::pool::runner::Runner>::handle\n             at rust/git/checkouts/yatp-e704b73c3ee279b6/3894a86/src/task/future.rs:254\n  11: <tikv::read_pool::ReadPoolRunner<E,R> as yatp::pool::runner::Runner>::handle\n             at home/jenkins/agent/workspace/ld_tikv_multi_branch_release-4.0/tikv/src/read_pool.rs:168\n      <yatp::queue::multilevel::MultilevelRunner<R> as yatp::pool::runner::Runner>::handle\n             at rust/git/checkouts/yatp-e704b73c3ee279b6/3894a86/src/queue/multilevel.rs:245\n      yatp::pool::worker::WorkerThread<T,R>::run\n             at rust/git/checkouts/yatp-e704b73c3ee279b6/3894a86/src/pool/worker.rs:48\n      yatp::pool::builder::LazyBuilder<T>::build::{{closure}}\n             at rust/git/checkouts/yatp-e704b73c3ee279b6/3894a86/src/pool/builder.rs:91\n      std::sys_common::backtrace::__rust_begin_short_backtrace\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/sys_common/backtrace.rs:136\n  12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n             at rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/mod.rs:469\n      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ...

4. What is your TiDB version? (Required)

release-4.0

+------+------------------------------------------+
| type | git_hash                                 |
+------+------------------------------------------+
| tidb | 51d365fc45fdfc039eb204a96268c5bd1c55075f |
| tikv | fff05823ed90d50e129e88373db2ca5209d5c650 |
| pd   | aeaac2489d52ba5c591eeb6418df408193bfd9db |
+------+------------------------------------------+
@zyguan zyguan added the type/bug The issue is confirmed as a bug. label Sep 16, 2020
@ZenoTan
Copy link

ZenoTan commented Sep 16, 2020

/label sig/ddl

@ti-srebot ti-srebot added the sig/sql-infra SIG: SQL Infra label Sep 16, 2020
@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 17, 2020

/assign @ichn-hu

@wjhuang2016 wjhuang2016 added sig/transaction SIG:Transaction and removed sig/sql-infra SIG: SQL Infra labels Sep 18, 2020
@iosmanthus
Copy link
Contributor

Seems the information push down from TiDB is wrong:

[components/tidb_query/src/batch/runner.rs:132] executor_descriptors = [
    tbl_scan { table_id: 55 columns { column_id: 1 tp: 3 collation: 63 column_len: 11 flag: 3 } columns { column_id: -1 tp: 8 column_len: 20 flag: 2 pk_handle: true } columns { column_id: -1 tp: 8 column_len: 20 flag: 2 pk_handle: true } },
    tp: TypeSelection selection { conditions { tp: ScalarFunc children { tp: ScalarFunc children { tp: ColumnRef val: 8000000000000001 field_type { tp: 7 flag: 136 flen: 19 collate: 63 charset: "binary" } } sig: TimeIsNull field_type { tp: 8 flag: 128 flen: 1 collate: 63 charset: "binary" } } sig: UnaryNotInt field_type { tp: 8 flag: 128 flen: 1 collate: 63 charset: "binary" } } },

The table scan never fetches a column which type id is 7 standing for a TimeStamp column.

@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 12, 2020

/unassign @ichn-hu

@youjiali1995 youjiali1995 added sig/execution SIG execution and removed sig/transaction SIG:Transaction labels Nov 12, 2020
@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 13, 2020

@zyguan @iosmanthus I tried to deploy a nightly cluster, but can't reproduce it, @iosmanthus would you please have a try again to see if you can reproduce it now?

mysql> drop table if exists t1, t2;
Query OK, 0 rows affected (0.38 sec)

mysql> -- init
mysql> create table t1 (c_int int, c_str varchar(40), c_datetime datetime, c_timestamp timestamp, primary key (c_int, c_datetime)  , key(c_str), key(c_timestamp)) partition by range (to_days(c_datetime)) ( partition p0 values less than (to_days('2020-02-01')), partition p1 values less than (to_days('2020-04-01')), partition p2 values less than (to_days('2020-06-01')), partition p3 values less than maxvalue) ;
Query OK, 0 rows affected (0.07 sec)

mysql> create table t2 like t1 ;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t1 values (1, 'elastic borg', '2020-04-11 01:01:02', '2020-01-13 17:17:47'), (2, 'hopeful swanson', '2020-06-01 21:48:06', '2020-01-14 11:55:42'), (3, 'fervent nightingale', '2020-01-25 05:03:12', '2020-01-15 20:24:00');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (1, 'agitated chandrasekhar', '2020-03-26 00:38:16', '2020-06-13 07:44:39'), (2, 'optimistic mclaren', '2020-01-17 11:06:35', '2020-03-27 12:53:29'), (3, 'clever nobel', '2020-03-28 10:06:23', '2020-03-26 18:29:45');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t2 where c_int = 2;
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1 where c_timestamp in (select c_timestamp from t2 where t1.c_int = t2.c_int or t1.c_int = t2.c_int and t1.c_str <= t2.c_str); -- ERROR 9002 (HY000): TiKV server timeout
Empty set (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> select tidb_version();
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v4.0.0-beta.2-1552-ga81c89f89
Edition: Community
Git Commit Hash: a81c89f89b6fbc81c23f1bd341b3bc6c0cb4475f
Git Branch: master
UTC Build Time: 2020-11-12 11:11:50
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 13, 2020

I can reproduce it with tidb 51d365fc45fdfc039eb204a96268c5bd1c55075f, I'll bisect the history to find out which PR fixed it.

@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 13, 2020

/assign @ichn-hu

@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 23, 2020

This issue is fixed by a0515bc on release-4.0, which is brought about by #20565 , it is now safe to close this issue.

PTAL @lzmhhh123

@ichn-hu
Copy link
Contributor

ichn-hu commented Nov 23, 2020

This fix is found by git bisect, the script is attached bellow

#!/bin/sh
cd /home/ichn/Projects/pingcap/tiup
yes | tiup cluster deploy tidb-timeout nightly ./mini.yaml --user ichn
cd /home/ichn/Projects/pingcap/tidb
make server
sudo cp ./bin/tidb-server /home/tidb/tmp/mini-deploy/tidb-4101/bin
tiup cluster start tidb-timeout
while true; do
    mysql -h 0.0.0.0 -u root -P 4101 -D test -e "select tidb_version()"
    if [[ $? -eq 0 ]]; then
        break
    fi
    sleep 1
done
mysql -h 0.0.0.0 -u root -P 4101 -D test -e "use test; drop table if exists t1, t2; create table t1 (c_int int, c_str varchar(40), c_datetime datetime, c_timestamp timestamp, primary key (c_int, c_datetime)  , key(c_str), key(c_timestamp)) partition by range (to_days(c_datetime)) ( partition p0 values less than (to_days('2020-02-01')), partition p1 values less than (to_days('2020-04-01')), partition p2 values less than (to_days('2020-06-01')), partition p3 values less than maxvalue) ; create table t2 like t1 ; insert into t1 values (1, 'elastic borg', '2020-04-11 01:01:02', '2020-01-13 17:17:47'), (2, 'hopeful swanson', '2020-06-01 21:48:06', '2020-01-14 11:55:42'), (3, 'fervent nightingale', '2020-01-25 05:03:12', '2020-01-15 20:24:00'); insert into t2 values (1, 'agitated chandrasekhar', '2020-03-26 00:38:16', '2020-06-13 07:44:39'), (2, 'optimistic mclaren', '2020-01-17 11:06:35', '2020-03-27 12:53:29'), (3, 'clever nobel', '2020-03-28 10:06:23', '2020-03-26 18:29:45');"
mysql -h 0.0.0.0 -u root -P 4101 -D test -e "use test; begin; delete from t2 where c_int = 2; select * from t1 where c_timestamp in (select c_timestamp from t2 where t1.c_int = t2.c_int or t1.c_int = t2.c_int and t1.c_str <= t2.c_str); commit;"
res=$?
if [[ $res -eq 0 ]]; then
        result=1
    echo "query okay, going backward"
else
        result=0
    echo "query bad, going forward"
fi
yes | tiup cluster destroy tidb-timeout
exit $result
╰─$ cat mini.yaml
# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/home/tidb/tmp/mini-deploy"
  data_dir: "/home/tidb/tmp/mini-data"

pd_servers:
  - host: 127.0.0.1
    peer_port: 13380
    client_port: 13379

tidb_servers:
  - host: 127.0.0.1
    port: 4101
    status_port: 11081

tikv_servers:
  - host: 127.0.0.1
    port: 21161

# monitoring_servers:
#   - host: 10.0.1.10
#
# grafana_servers:
#   - host: 10.0.1.10
#
# alertmanager_servers:
#   - host: 10.0.1.10
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect start                                                                                                                                                                                                                                      1 ↵
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect good 51d365fc45fdfc039eb204a96268c5bd1c55075f
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect bad release-4.0
Bisecting: 86 revisions left to test after this (roughly 7 steps)
[639d4596bc32571376390095bfd6556d7f61866c] 2pc: fix backoffer and cancel in batchExecutor (#20542) (#20548)
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹639d4596b›
╰─$ git bisect run ../bisect-20027.sh

@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

9 participants