Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB gets OOM killed on pruning hash partitions #26227

Closed
zyguan opened this issue Jul 14, 2021 · 10 comments
Closed

TiDB gets OOM killed on pruning hash partitions #26227

zyguan opened this issue Jul 14, 2021 · 10 comments
Assignees
Labels
severity/critical sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@zyguan
Copy link
Contributor

zyguan commented Jul 14, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists tbl_28;
create table `tbl_28` (`col_209` bigint(20) not null default '-4413003002508764546',`col_210` double not null default '570.6896586720441',`col_211` decimal(61,30) not null default '9939',primary key (`col_209`) /*t![clustered_index] clustered */,unique key `idx_68` (`col_211`,`col_210`,`col_209`),key `idx_69` (`col_210`)) engine=innodb default charset=utf8 collate=utf8_general_ci partition by hash( `col_209` ) partitions 6;
select /*+ stream_agg() */ bit_or( col_209 ) aggcol from (select   * from tbl_28 where not( tbl_28.col_211 <= 26.76 ) and not( tbl_28.col_209 in ( 6043174761261718958 ) ) and tbl_28.col_209 between -7622884926923238988 and 4467923861679270794 order by col_209  ) ordered_tbl group by col_210,col_211 order by aggcol for update;

2. What did you expect to see? (Required)

The last query succeed.

3. What did you see instead (Required)

Connection refused because tidb-server get oom-killed.

4. What is your TiDB version? (Required)

release-5.1 (8d62202)

@zyguan zyguan added type/bug The issue is confirmed as a bug. sig/planner SIG: Planner labels Jul 14, 2021
@ChenPeng2013
Copy link
Contributor

It seems like #25598

@XuHuaiyu
Copy link
Contributor

@zyguan Log or the source data is needed.

@zyguan
Copy link
Contributor Author

zyguan commented Jul 15, 2021

@zyguan Log or the source data is needed.

@XuHuaiyu The issue can be reproduced on release-5.0. Memory is allocated at here, it seems posLow and posHigh were evaled incorrectly. You can also find the tmp-storage here.

2021-07-14_104125

@tiancaiamao
Copy link
Contributor

It seems like #25598

Not the same cause @ChenPeng2013

between -7622884926923238988 and 4467923861679270794

Current hash pruning is toooooooooooo inefficient for this case...
It calculate the range and then take every point data from the range to check if it fit in any partition

@tiancaiamao
Copy link
Contributor

A simple fix here is to change the used from a []int to a map, that would reduce the allocation. (but not avoid the repeated calculation)

@qw4990 qw4990 self-assigned this Jul 15, 2021
@qw4990
Copy link
Contributor

qw4990 commented Jul 22, 2021

It can be solved by #25599. I forgot to pick it into v5.1. I'll pick it soon.

@tiancaiamao
Copy link
Contributor

It can be solved by #25599. I forgot to pick it into v5.1. I'll pick it soon.

Not the same problem... #25599 is overflow, but this one is the problem of the algorithm

@qw4990
Copy link
Contributor

qw4990 commented Jul 23, 2021

It can be solved by #25599. I forgot to pick it into v5.1. I'll pick it soon.

Not the same problem... #25599 is overflow, but this one is the problem of the algorithm

Emm, I cannot reproduce this bug after picking #26471.
I inspected the stack of this issue yesterday, and found it was also stuck in the same loop as #25599:
image

Could you check it again at your convenience~ @tiancaiamao

@tiancaiamao
Copy link
Contributor

I can't reproduce this bug too, so it maybe cause by the overflow ...

We can leave the used = append(used, int(idx)) as an enhancement when [posLow, posHigh] range is too large.

Since it can't reproduce, I think we can close this issue. @zyguan @qw4990

@qw4990 qw4990 closed this as completed Jul 29, 2021
@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/critical sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

7 participants