Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB hangs in tidb-ci test #4912

Closed
JaySon-Huang opened this issue May 17, 2022 · 4 comments
Closed

TiDB hangs in tidb-ci test #4912

JaySon-Huang opened this issue May 17, 2022 · 4 comments

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented May 17, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

TiKV hangs for over 40 minutes for running tidb-ci test
https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/6387/pipeline

You can download the logs files here archive.zip
OR
https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/6387/artifacts

2. What did you expect to see? (Required)

All tests passed in a reasonable time (usually less than 10 minutes)

3. What did you see instead (Required)

Hangs for 40 minutes.

4. What is your TiFlash version? (Required)

TiFlash @ cb236c1
proxy @ e98f95970781e3b70793f8c90e7c7022e8354ef8
TiKV @ b6220123ef333a9e7d7033a060d0c6adea6ad1b2
PD @ 562586ceebae1dc41cf7c04e9770558d9ea5a067
TiDB @ 683ba092b565aa5947da998507550f8da5c406bd

@JaySon-Huang JaySon-Huang added the type/bug The issue is confirmed as a bug. label May 17, 2022
@JaySon-Huang JaySon-Huang changed the title TiKV hangs in tidb-ci test TiDB hangs in tidb-ci test May 17, 2022
@JaySon-Huang
Copy link
Contributor Author

These logs are as expected after tikv/tikv#12050

Suspicious log from TiKV

TiKV keep outputting logs likes UNIMPLEMENTED to store_id == 58 ``` # TiFlash store id in tiflash's log [2022/05/17 14:23:18.527 +08:00] [INFO] [KVStore.cpp:739] ["KVStore:Set store info id: 58 address: \"tiflash0:3930\" labels { key: \"engine\" value: \"tiflash\" } version: \"v6.1.0-alpha-176-gcb236c1\" peer_address: \"tiflash0:20170\" status_address: \"tiflash0:20181\" git_hash: \"cb236c13c5d9a1ab79eced48600725c454ad44fb\" start_timestamp: 1652768598 deploy_path: \"/tiflash\""] [thread_id=7] ```
# TiKV logs
[2022/05/17 06:23:35.606 +00:00] [INFO] [snap.rs:919] ["scan snapshot of one cf"] [size=0] [key_count=0] [cf=lock] [snapshot=/data/snap/gen_94_6_7_(default|lock|write).sst] [region_id=94]
[2022/05/17 06:23:35.606 +00:00] [INFO] [snap.rs:919] ["scan snapshot of one cf"] [size=0] [key_count=0] [cf=write] [snapshot=/data/snap/gen_94_6_7_(default|lock|write).sst] [region_id=94]
[2022/05/17 06:23:35.606 +00:00] [INFO] [snap.rs:1049] ["scan snapshot"] [takes=1.250598ms] [size=0] [key_count=0] [snapshot=/data/snap/gen_94_6_7_(default|lock|write).sst] [region_id=94]
[2022/05/17 06:23:35.607 +00:00] [INFO] [snap.rs:674] ["set_snapshot_meta total cf files count: 3"]
[2022/05/17 06:23:35.617 +00:00] [INFO] [snap.rs:459] ["sent snapshot"] [duration=9.155877ms] [size=0] [snap_key=94_6_7] [region_id=94]
[2022/05/17 06:23:35.617 +00:00] [INFO] [peer.rs:1677] ["report snapshot status"] [status=Finish] [to="id: 96 store_id: 58 role: Learner"] [peer_id=95] [region_id=94]
[2022/05/17 06:23:35.679 +00:00] [INFO] [scheduler.rs:548] ["get snapshot failed"] [err="Error(Request(message: \"EpochNotMatch current epoch of region 2 is conf_ver: 1 version: 45, but you sent conf_ver: 1 version: 42\" epoch_not_match { current_regions { id: 2 start_key: 7480000000000000FF5500000000000000F8 region_epoch { conf_ver: 1 version: 45 } peers { id: 3 store_id: 1 } } current_regions { id: 94 start_key: 7480000000000000FF545F720000000000FA end_key: 7480000000000000FF5500000000000000F8 region_epoch { conf_ver: 2 version: 45 } peers { id: 95 store_id: 1 } peers { id: 96 store_id: 58 role: Learner } } current_regions { id: 92 start_key: 7480000000000000FF5400000000000000F8 end_key: 7480000000000000FF545F720000000000FA region_epoch { conf_ver: 1 version: 45 } peers { id: 93 store_id: 1 } } current_regions { id: 90 start_key: 7480000000000000FF4C00000000000000F8 end_key: 7480000000000000FF5400000000000000F8 region_epoch { conf_ver: 1 version: 43 } peers { id: 91 store_id: 1 } } }))"] [cid=492]
[2022/05/17 06:23:35.822 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:36.813 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:37.811 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:38.812 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:39.812 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:40.817 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:41.814 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:42.815 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:43.815 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2022/05/17 06:23:44.817 +00:00] [INFO] [advance.rs:294] ["check leader failed"] [to_store=58] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]

https://github.com/tikv/tikv/blob/b6220123ef333a9e7d7033a060d0c6adea6ad1b2/components/resolved_ts/src/advance.rs#L277-L306

@JaySon-Huang
Copy link
Contributor Author

After table test.tx created, there is no request received by TiFlash, but TiDB logs that select count(distinct b) from test.tx is an expensive_query

[2022/05/17 14:23:36.088 +08:00] [INFO] [ApplySnapshot.cpp:464] ["KVStore:Try to apply snapshot [region 94, applied: term 6 index 7]"] [thread_id=15]
[2022/05/17 14:23:36.088 +08:00] [INFO] [DeltaMergeStore.cpp:767] ["DeltaMergeStore:db_1.t_84 table: db_1.t_84, rows: 0, bytes: 0, bytes on disk: 0, region range: [-9223372036854775808,9223372036854775807), clear_data: true"] [thread_id=15]
[2022/05/17 14:23:36.088 +08:00] [TRACE] [Segment.cpp:344] ["Segment:Segment [1] write region snapshot: [-9223372036854775808,9223372036854775807)"] [thread_id=15]
[2022/05/17 14:23:36.088 +08:00] [INFO] [DeltaMergeStore.cpp:903] ["DeltaMergeStore:db_1.t_84 table: db_1.t_84, clear_data: true, ingest <empty> into segment [1]"] [thread_id=15]
[2022/05/17 06:23:39.850 +00:00] [INFO] [ddl_worker.go:475] ["[ddl] finish DDL job"] [worker="worker 3, tp general"] [job="ID:87, Type:update tiflash replica status, State:synced, SchemaState:public, SchemaID:1, TableID:84, RowCount:0, ArgLen:0, start time: 2022-05-17 06:23:39.758 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/05/17 06:23:39.857 +00:00] [INFO] [ddl.go:698] ["[ddl] DDL job is finished"] [jobID=87]
[2022/05/17 06:23:39.857 +00:00] [INFO] [callback.go:107] ["performing DDL change, must reload"]
[2022/05/17 06:24:19.305 +00:00] [INFO] [data_window.go:249] ["Error exists when getting the SQL Metric."]
[2022/05/17 06:24:40.362 +00:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.05312269s] [conn_id=491] [user=root] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="select count(distinct b) from test.tx"]
[2022/05/17 06:25:19.310 +00:00] [INFO] [data_window.go:249] ["Error exists when getting the SQL Metric."]

@JaySon-Huang
Copy link
Contributor Author

/cc @zanmato1984 @windtalker

@XuHuaiyu
Copy link
Contributor

fixed in pingcap/tidb#34726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants