Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FetchDisaggPages may wait forever when network partitions happened #8806

Closed
JinheLin opened this issue Feb 29, 2024 · 0 comments · Fixed by #8807
Closed

FetchDisaggPages may wait forever when network partitions happened #8806

JinheLin opened this issue Feb 29, 2024 · 0 comments · Fixed by #8807
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. component/storage severity/moderate type/bug The issue is confirmed as a bug.

Comments

@JinheLin
Copy link
Contributor

JinheLin commented Feb 29, 2024

Injecting network partitions into write nodes for 10 minutes, from 2024-02-28 20:44:53 to 2024-02-28 20:54:53.

RPC request of FetchDisaggPages wait until the network is recovered.
(In the log below, although the page hit rate is 100%, we still need to fetch data in memtable from WNs.)

[2024/02/28 20:44:54.006 +08:00] [DEBUG] [SegmentReadTask.cpp:355] ["Ready to fetch pages, seg_task=s6_t2737_345_2_15500217546706758 page_hit_rate=100.00% pages_not_in_cache=[]"] [source="MPP<gather_id:1, query_ts:1709124283066671989, local_query_id:2280, server_id:1542, start_ts:448036676047732823,task_id:3> store_id=6 keyspace=4294967295 table_id=2737 segment_id=345 epoch=2 delta_epoch=15500217546706758"] [thread_id=6]
[2024/02/28 20:56:21.157 +08:00] [ERROR] [SegmentReadTask.cpp:386] ["s6_t2737_345_2_15500217546706758: Code: 11004, e.displayText() = DB::Exception: Check snap != nullptr failed: Can not find disaggregated task, task_id=DisTaskId<MPP<gather_id:1, query_ts:1709124283066671989, local_query_id:2280, server_id:1542, start_ts:448036676047732823,task_id:3>,executor=TableFullScan_41> (from s6_t2737_345_2_15500217546706758), e.what() = DB::Exception... [source="MPP<gather_id:1, query_ts:1709124283066671989, local_query_id:2280, server_id:1542, start_ts:448036676047732823,task_id:3> store_id=6 keyspace=4294967295 table_id=2737 segment_id=345 epoch=2 delta_epoch=15500217546706758"] [thread_id=6]

From the document of gRPC:

By default, gRPC does not set a deadline which means it is possible for a client to end up waiting for a response effectively forever. To avoid this you should always explicitly set a realistic deadline in your clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. component/storage severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant