Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB panicked with index out of range [-1] #30382

Closed
zyguan opened this issue Dec 3, 2021 · 11 comments
Closed

TiDB panicked with index out of range [-1] #30382

zyguan opened this issue Dec 3, 2021 · 11 comments
Assignees
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. component/tablepartition This issue is related to Table Partition of TiDB. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@zyguan
Copy link
Contributor

zyguan commented Dec 3, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

/* t */ set @@session.tidb_enable_list_partition = ON;
/* t */ drop table if exists t1, t2;
/* t */ create table t1  (c_int int, c_str varchar(40), c_decimal decimal(12, 6), primary key (c_int) , key(c_str(2)) , key(c_decimal) ) partition by list (c_int) ( partition p0 values IN (1, 5, 9, 13, 17, 21, 25, 29, 33, 37), partition p1 values IN (2, 6, 10, 14, 18, 22, 26, 30, 34, 38), partition p2 values IN (3, 7, 11, 15, 19, 23, 27, 31, 35, 39), partition p3 values IN (4, 8, 12, 16, 20, 24, 28, 32, 36, 40)) ;
/* t */ create table t2  (c_int int, c_str varchar(40), c_decimal decimal(12, 6), primary key (c_int) , key(c_str) , key(c_decimal) ) partition by hash (c_int) partitions 4 ;
/* t */ insert into t1 values (6, 'musing mayer', 1.280), (7, 'wizardly heisenberg', 6.589), (8, 'optimistic swirles', 9.633), (9, 'hungry haslett', 2.659), (10, 'stupefied wiles', 2.336);
/* t */ insert into t2 select * from t1 ;
/* t */ begin;
/* t */ select * from t1 where c_str <> any (select c_str from t2 where c_decimal < 5) for update;
/* t */ commit;

related issues: #25812, #28141, #26380 .

2. What did you expect to see? (Required)

All statements are executed without error.

3. What did you see instead (Required)

/* t */ select * from t1 where c_str <> any (select c_str from t2 where c_decimal < 5) for update;
-- t >> E1105: runtime error: index out of range [-1]
goroutine 126207 [running]:
github.com/pingcap/tidb/server.(*clientConn).Run.func1(0x42fd910, 0xc0006e6840, 0xc0022b6500)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/conn.go:1017 +0xf5
panic(0x3b7dcc0, 0xc0019ae930)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/pingcap/tidb/executor.(*ExecStmt).Exec.func1(0xc00155b1e0, 0xc002464a08, 0xc0024649e8)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/adapter.go:343 +0x4d4
panic(0x3b7dcc0, 0xc0019ae930)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/pingcap/tidb/util/chunk.Row.GetInt64(...)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/util/chunk/row.go:53
github.com/pingcap/tidb/executor.(*SelectLockExec).Next(0xc002122e70, 0x42fd910, 0xc00250b380, 0xc00218f630, 0x28, 0x4)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/executor.go:933 +0x85e
github.com/pingcap/tidb/executor.Next(0x42fd910, 0xc00250b380, 0x4303220, 0xc002122e70, 0xc00218f630, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/executor.go:286 +0x2de
github.com/pingcap/tidb/executor.(*ExecStmt).runPessimisticSelectForUpdate(0xc00155b1e0, 0x42fd910, 0xc00250b380, 0x4303220, 0xc002122e70, 0x0, 0x0, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/adapter.go:548 +0x285
github.com/pingcap/tidb/executor.(*ExecStmt).handlePessimisticSelectForUpdate(0xc00155b1e0, 0x42fd910, 0xc00250b380, 0x4303220, 0xc002122e70, 0x61a40a0, 0x42fd903, 0x0, 0xc002464938)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/adapter.go:529 +0x5d
github.com/pingcap/tidb/executor.(*ExecStmt).Exec(0xc00155b1e0, 0x42fd910, 0xc00250b380, 0x0, 0x0, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/executor/adapter.go:416 +0xb49
github.com/pingcap/tidb/session.runStmt(0x42fd910, 0xc0023ebb60, 0xc0011a9e00, 0x43144e0, 0xc00155b1e0, 0x0, 0x0, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/session/session.go:1698 +0x37f
github.com/pingcap/tidb/session.(*session).ExecuteStmt(0xc0011a9e00, 0x42fd910, 0xc0023ebb60, 0x431c338, 0xc00152c5a0, 0x0, 0x0, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/session/session.go:1582 +0xab1
github.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt(0xc0006e72f0, 0x42fd910, 0xc0023ebb60, 0x431c338, 0xc00152c5a0, 0xc001341340, 0x42fd910, 0xc0023ebb60, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/driver_tidb.go:219 +0x6b
github.com/pingcap/tidb/server.(*clientConn).handleStmt(0xc0022b6500, 0x42fd868, 0xc0023ebb60, 0x431c338, 0xc00152c5a0, 0x61d7188, 0x0, 0x0, 0x1, 0x0, ...)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/conn.go:1950 +0x1d1
github.com/pingcap/tidb/server.(*clientConn).handleQuery(0xc0022b6500, 0x42fd868, 0xc0011cae80, 0xc0018dca81, 0x62, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/conn.go:1819 +0x498
github.com/pingcap/tidb/server.(*clientConn).dispatch(0xc0022b6500, 0x42fd868, 0xc0011cae80, 0xc0018dca80, 0x63, 0x62, 0x0, 0x0)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/conn.go:1324 +0xafd
github.com/pingcap/tidb/server.(*clientConn).Run(0xc0022b6500, 0x42fd910, 0xc0006e6840)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/conn.go:1079 +0x2bc
github.com/pingcap/tidb/server.(*Server).onConn(0xc0015ca9c0, 0xc0022b6500)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/server/server.go:548 +0xa93
created by github.com/pingca...

4. What is your TiDB version? (Required)

master (a046014)

@zyguan zyguan added type/bug The issue is confirmed as a bug. sig/execution SIG execution severity/major labels Dec 3, 2021
@zyguan
Copy link
Contributor Author

zyguan commented Dec 3, 2021

cc @tiancaiamao @Yisaer

@tiancaiamao
Copy link
Contributor

I find the extra PID column is missing in the final schema, and that's the direct cause of the panic.

Then I print out the plan and find the extra PID column is eliminated.
After logical optimize the plan is

Join{
    PartitionUnionAll{Partition(71)->Partition(72)->Partition(73)->Partition(74)}->
    PartitionUnionAll{Partition(77)->Partition(78)->Partition(79)->Partition(80)}->
        Aggr(max(test.t2.c_str),count(distinct test.t2.c_str),sum(isnull(test.t2.c_str)),count(1))->
            Sel([ne(Column#10, 0)])}->
                Projection->Lock->Projection

and after physical optimize, the plan is

LeftHashJoin{
    PartitionUnionAll{
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))}->
    PartitionUnionAll{TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))}->
        StreamAgg->
            Sel([ne(Column#10, 0)])}->
                Projection->Lock->Projection

and after postoptimize, the plan is:

LeftHashJoin{
    PartitionUnionAll{
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))->
    TableReader(Table(t1)->Sel([if(isnull(test.t1.c_str), <nil>, 1)]))}->
    PartitionUnionAll{
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))->
    TableReader(Table(t2)->Sel([lt(test.t2.c_decimal, 5)]))}->
        Projection->
            StreamAgg->
                Sel([ne(Column#10, 0)])}->Lock

As you can see, after the postoptimize step, the final projection is elimitated, this is not expected.

@XuHuaiyu XuHuaiyu added sig/planner SIG: Planner and removed sig/execution SIG execution labels Dec 7, 2021
@tiancaiamao
Copy link
Contributor

It's caused by here

func (p *LogicalProjection) PruneColumns(parentUsedCols []*expression.Column) error {
child := p.children[0]
used := expression.GetUsedList(parentUsedCols, p.schema)
for i := len(used) - 1; i >= 0; i-- {
if !used[i] && !exprHasSetVarOrSleep(p.Exprs[i]) {
p.schema.Columns = append(p.schema.Columns[:i], p.schema.Columns[i+1:]...)
p.Exprs = append(p.Exprs[:i], p.Exprs[i+1:]...)
}
}
selfUsedCols := make([]*expression.Column, 0, len(p.Exprs))
selfUsedCols = expression.ExtractColumnsFromExpressions(selfUsedCols, p.Exprs, nil)
return child.PruneColumns(selfUsedCols)
}

The plan is DataSource -> ... -> Projection(2) -> Lock -> Projection(1)
Only the Lock and the DataSource's schema contain the extra PID column

In column pruning of Project(2), the parentUsedColumns contain the extra PID column,
but the columns are not in the Project(2)'s schema, so it's dropped ... cause the extra PID column missing...

@tiancaiamao
Copy link
Contributor

I'll find another way to fix it...
#21148 fixes the original issue but introduce too many corner cases and this is one of it.

@tiancaiamao tiancaiamao self-assigned this Dec 8, 2021
@tiancaiamao
Copy link
Contributor

How to make the Agg executor keep the children's schema even it's not in the agg function?

Join {DataSource1, DataSource2 -> Agg} -> Lock 

In Agg's schema, the partition columns from DataSource2 is dropped...

@winoros
Copy link
Member

winoros commented Dec 8, 2021

After some offline discussion, i update some results here.
Previous, @tiancaiamao think that for a SQL like the given one above we should lock both the table returned to the SELECT clause and the table in the subquery.
But from the MySQL's doc https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html, we don't need to lock the subquery tables.

This can simplify this scenario, which can help us a lot.

@yudongusa
Copy link

@tiancaiamao @winoros any update on this issue?

@tiancaiamao
Copy link
Contributor

mysql> show create table t;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                           |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` int DEFAULT NULL,
  `c` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from t;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 |
|    3 |    2 |    1 |
|    2 |    2 |    2 |
|    2 |    3 |    4 |
+------+------+------+
4 rows in set (0.00 sec)

mysql> select * from t1;
+------+------+
| id   | c    |
+------+------+
|    1 |    8 |
+------+------+
1 row in set (0.00 sec)

Join will block the both tables:

session1:
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t join t1 on t1.id = t.c for update;
+------+------+------+------+------+
| a    | b    | c    | id   | c    |
+------+------+------+------+------+
|    3 |    2 |    1 |    1 |    8 |
+------+------+------+------+------+
1 row in set (0.00 sec)

session2:
mysql> update t1 set c = c + 1 where id = 1;
;; block

While subquery just blocks the outer table:

session1:
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t where t.a in (select id from t1) for update;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 |
+------+------+------+
1 row in set (0.00 sec)

session2:
mysql> update t1 set c = c + 1 where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

It means when rewrite the apply to join, the SelectLock be the children of the Join, rather than be the parent of the join.

@yudongusa
Copy link

So this seems that sub-q to join rewrite potentially changed the behavior of the lock. If this is the case, one option is that we may need to add exemption in sub-q to join rewrite to block this rewrite when it involves for update; or we mark the table from sub-q to be excluded from update lock during sub-q to join rewrite. What do you guys think? @tiancaiamao @winoros

@jebter jebter added affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. labels Jan 11, 2022
mjonss added a commit to mjonss/tidb that referenced this issue Feb 2, 2022
@mjonss
Copy link
Contributor

mjonss commented Feb 14, 2022

/component tablepartition

@ti-chi-bot ti-chi-bot added the component/tablepartition This issue is related to Table Partition of TiDB. label Feb 14, 2022
@mjonss
Copy link
Contributor

mjonss commented Mar 3, 2022

Fixed by #31634.

@vivid392845427 vivid392845427 added the affects-5.2 This bug affects 5.2.x versions. label Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. component/tablepartition This issue is related to Table Partition of TiDB. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants