Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel prehandle snapshot to speed up catch up with TiKV large region #8081

Closed
CalvinNeo opened this issue Sep 12, 2023 · 1 comment · Fixed by #8108, #8183, #8184, #8185 or #9131
Closed

Parallel prehandle snapshot to speed up catch up with TiKV large region #8081

CalvinNeo opened this issue Sep 12, 2023 · 1 comment · Fixed by #8108, #8183, #8184, #8185 or #9131
Labels
component/storage type/enhancement The issue or PR belongs to an enhancement.

Comments

@CalvinNeo
Copy link
Member

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

@CalvinNeo
Copy link
Member Author

CalvinNeo commented Sep 12, 2023

TiFlash can't catch up, it is pre-handling snapshot

[2023/09/10 14:08:08.876 +08:00] [INFO] [FlashService.cpp:454] ["Handling mpp dispatch request, task meta: start_ts: 444157410608676864\ntask_id: 14\naddress: \"tc-tiflash-1.tc-tiflash-peer.endless-htap-stale-read-tps-2340453-1-599.svc:3930\"\ngather_id: 1\nquery_ts: 1694326088861873714\nlocal_query_id: 217\nserver_id: 1046\nmpp_version: 2\nkeyspace_id: 4294967295\ncoordinator_address: \"tc-tidb-0.tc-tidb-peer.endless-htap-stale-read-tps-2340453-1-599.svc:10080\"\n"] [source=FlashService] [thread_id=1004]
[2023/09/10 14:08:08.876 +08:00] [INFO] [FlashService.cpp:454] ["Handling mpp dispatch request, task meta: start_ts: 444157410608676864\ntask_id: 7\naddress: \"tc-tiflash-1.tc-tiflash-peer.endless-htap-stale-read-tps-2340453-1-599.svc:3930\"\ngather_id: 1\nquery_ts: 1694326088861873714\nlocal_query_id: 217\nserver_id: 1046\nmpp_version: 2\nkeyspace_id: 4294967295\ncoordinator_address: \"tc-tidb-0.tc-tidb-peer.endless-htap-stale-read-tps-2340453-1-599.svc:10080\"\n"] [source=FlashService] [thread_id=125]
[2023/09/10 14:08:08.876 +08:00] [DEBUG] [MPPTask.cpp:393] ["Handling 1 regions from 1 physical tables in MPP task"] [source="MPP<gather_id:<gather_id:1, query_ts:1694326088861873714, local_query_id:217, server_id:1046, start_ts:444157410608676864, resource_gorup: >,task_id:14>"] [thread_id=1004]
[2023/09/10 14:08:08.883 +08:00] [DEBUG] [Region.cpp:703] ["[region_id=535 applied_term=7 applied_index=732115] need to wait learner index 865644 timeout 300000"] [thread_id=921]
[2023/09/10 14:13:08.883 +08:00] [WARN] [Region.cpp:720] ["[region_id=535] wait learner index 865644 timeout"] [thread_id=921]
[2023/09/10 14:09:20.545 +08:00] [INFO] [snapshot.rs:115] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 535, term: 7, idx: 862994 }, display_path: \"/data0/proxy/snap/rev_535_7_862994_(default|lock|write).sst\" })"] [pending=0] [has_snap=true] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [region_id=535] [peer_id=741]
[2023/09/10 14:09:20.867 +08:00] [INFO] [snapshot.rs:257] ["post apply snapshot"] [pending=1] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [region_id=535] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [peer_id=741]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer_storage.rs:605] ["begin to apply snapshot"] [peer_id=741] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer_storage.rs:688] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 994199 truncated_state { index: 994199 term: 7 }"] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [peer_id=741] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [snapshot.rs:115] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 535, term: 7, idx: 994199 }, display_path: \"/data0/proxy/snap/rev_535_7_994199_(default|lock|write).sst\" })"] [pending=0] [has_snap=true] [snap_key="SnapKey { region_id: 535, term: 7, idx: 994199 }"] [region_id=535] [peer_id=741]
[2023/09/10 14:14:33.193 +08:00] [INFO] [snapshot.rs:257] ["post apply snapshot"] [pending=1] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [region_id=535] [snap_key="SnapKey { region_id: 535, term: 7, idx: 994199 }"] [peer_id=741]

There is only one snapshot in prehandle pool. It takes 4min to prehandle

TiFlash is handling the only snapshot, the next one arrives before the first one is finished.

[2023/09/10 14:09:20.545 +08:00] [INFO] [snapshot.rs:115] ["pre apply snapshot"] [snapshot="Some(Snapshot { key: SnapKey { region_id: 535, term: 7, idx: 862994 }, display_path: \"/data0/proxy/snap/rev_535_7_862994_(default|lock|write).sst\" })"] [pending=0] [has_snap=true] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [region_id=535] [peer_id=741]
[2023/09/10 14:09:20.555 +08:00] [INFO] [apply.rs:3927] ["re-register to apply delegates"] [term=7] [peer_id=741] [region_id=535]
[2023/09/10 14:09:20.867 +08:00] [INFO] [region.rs:455] ["begin apply snap data"] [peer_id=741] [region_id=535]
[2023/09/10 14:09:20.867 +08:00] [INFO] [snapshot.rs:257] ["post apply snapshot"] [pending=1] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [region_id=535] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [peer_id=741]
[2023/09/10 14:13:13.421 +08:00] [INFO] [snapshot.rs:331] ["get prehandled snapshot success"] [pending=1] [region_id=535] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [peer_id=741]
[2023/09/10 14:13:13.534 +08:00] [INFO] [snapshot.rs:366] ["apply snapshot finished"] [elapsed=232666] [pending=0] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [snap_key="SnapKey { region_id: 535, term: 7, idx: 862994 }"] [peer_id=741] [mode=normal]
[2023/09/10 14:13:13.534 +08:00] [INFO] [region.rs:503] ["apply new data"] [time_takes=232.666819125s] [region_id=535]
[2023/09/10 14:13:14.134 +08:00] [INFO] [tablet_snap.rs:499] ["begin to receive tablet snapshot files"] [temp_exists=false] [region_id=535] [file=/data0/proxy/snap_v2/rev_535_741_7_994199.tmp]
[2023/09/10 14:13:14.134 +08:00] [INFO] [tablet_snap.rs:513] ["not using cache"] [err="Other(\"cache is disabled\")"] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer_storage.rs:605] ["begin to apply snapshot"] [peer_id=741] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer_storage.rs:1046] ["finish clear peer meta"] [takes=4.602µs] [raft_key=1] [apply_key=1] [meta_key=1] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer_storage.rs:688] ["apply snapshot with state ok"] [for_witness=false] [state="applied_index: 994199 truncated_state { index: 994199 term: 7 }"] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [peer_id=741] [region_id=535]
[2023/09/10 14:14:32.789 +08:00] [INFO] [peer.rs:4894] ["snapshot is persisted"] [region="id: 535 start_key: 748000000000000AFF7E5F728000000001FFC2E98E0000000000FA end_key: 748000000000000AFF7E5F728000000003FF858A070000000000FA region_epoch { conf_ver: 22 version: 77 } peers { id: 537 store_id: 8 } peers { id: 538 store_id: 29 } peers { id: 539 store_id: 5 role: Learner } peers { id: 553 store_id: 7 } peers { id: 722 store_id: 4 role: Learner } peers { id: 741 store_id: 6 role: Learner }"] [peer_id=741] [region_id=535]

ti-chi-bot bot pushed a commit that referenced this issue Sep 19, 2023
ti-chi-bot bot pushed a commit that referenced this issue Oct 13, 2023
ti-chi-bot bot pushed a commit that referenced this issue Oct 19, 2023
ti-chi-bot bot pushed a commit that referenced this issue Jun 7, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jun 21, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jun 25, 2024
close #8081

Signed-off-by: CalvinNeo <[email protected]>

Co-authored-by: JaySon <[email protected]>
@JaySon-Huang JaySon-Huang added type/enhancement The issue or PR belongs to an enhancement. and removed feature/developing severity/moderate labels Jul 5, 2024
@JaySon-Huang JaySon-Huang changed the title [Dynamic Region] TiFlash can't catch up with TiKV when there is only one big region pending prehandle Parallel prehandle snapshot to speed up catch up with TiKV large region Jul 5, 2024
@JaySon-Huang JaySon-Huang removed the type/bug The issue is confirmed as a bug. label Jul 5, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit that referenced this issue Aug 26, 2024
ref #8081

Simplify region_id, split_id logging when pre-handling snapshots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment