Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV OOM under TPCC workload when two tikv down for 10 minutes #12159

Closed
cosven opened this issue Mar 15, 2022 · 1 comment · Fixed by #12190
Closed

TiKV OOM under TPCC workload when two tikv down for 10 minutes #12159

cosven opened this issue Mar 15, 2022 · 1 comment · Fixed by #12190
Assignees
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 found/automation severity/critical type/bug The issue is confirmed as a bug.

Comments

@cosven
Copy link
Member

cosven commented Mar 15, 2022

Bug Report

TiKV OOM.

What version of TiKV are you using?

$ /tikv-server -V 
TiKV 
Release Version:  6.0.0-alpha 
Edition:          Community 
Git Commit Hash:  8954a76e2b87575d80336f502a4d078e5da1508f 
Git Commit Branch: heads/refs/tags/v6.0.0-nightly 
UTC Build Time:   2022-03-09 18:10:26 
Rust Version:     rustc 1.60.0-nightly (1e12aef3f 2022-02-13) 
Enable Features:  jemalloc mem-profiling portable sse test-engines-rocksdb cloud-aws cloud-gcp cloud-azure 
Profile:          dist_release

What operating system and CPU are you using?

Steps to reproduce

  1. create a cluster with 5 replicas
    [replication]
    max-replicas = 5
    
  2. run tpcc workload for 10 minutes (go-tpc)
  3. down 2 tikvs for about 10 minutes

After the 2 tikvs were down, the other 3 tikvs's memory kept increasing.

What did you expect?

No OOM.

What did happened?

TiKV OOM.
image
image

For more details, see https://pingcap.feishu.cn/docs/doccnTugaMBp2jB62ZDwWX0Vv2c# .

@cosven
Copy link
Member Author

cosven commented Mar 15, 2022

/type bug
/severity critical
/assign @5kbpers
/found automation

@ti-chi-bot ti-chi-bot added type/bug The issue is confirmed as a bug. severity/critical found/automation labels Mar 15, 2022
@5kbpers 5kbpers added the affects-5.4 This bug affects the 5.4.x(LTS) versions. label Apr 15, 2022
ti-chi-bot added a commit that referenced this issue Apr 28, 2022
…12393)

ref #11809, ref #12050, close #12159, ref #12190

* Do not hold mutex during calling `get_store_async`
* Add more metrics

Signed-off-by: 5kbpers <[email protected]>
Signed-off-by: qupeng <[email protected]>

Co-authored-by: 5kbpers <[email protected]>
Co-authored-by: qupeng <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 found/automation severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants