-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Retryable request accepted by leader but rejected by followers #23523
Comments
Some findings: Leader ts1 should crash at around 07:02:13. After ts1 restarted, it was elected as leader again.
From the LMP error message, follower ts2 should have replicated 182394.
OP 182395 has the duplicate request id Some possibilities:
|
Summary: Retryable request accepted by leader but rejected by followers, which is unexpected. This diff adds more debug info when this happens. Jira: DB-12439 Test Plan: Jenkins Reviewers: sergei, rthallam, timur Reviewed By: timur Subscribers: ybase, qhu Differential Revision: https://phorge.dev.yugabyte.com/D37335
Summary: 0fa2b24 Fix table layout to utilize maximum available space (#23564) 336d00d [PLAT-14981]Increase default slow query length e5127f8 [DOC-445] TA-22935: Potential Issues with Server-side Sequence Caching in Multi-Database Clusters (#23520) b3389ff [#23493] xCluster: code for ensuring there's an update for every sequence in WAL cb26a09 [#23548] Tools: Clean-up sys-catalog-tool code 28025f6 [docs] Visualize migration assessment updates (#23358) 9c0de5d [#23257] YSQL: Change conflict error string for RC transactions 507432b [#23523] docdb: retryable requests instrumentation e4645e5 [DOC-431] Added a note for GKE cluster docs (#23349) d1576c4 [PLAT-12905] Add HA Metrics Page 02da1f0 [PLAT-14869][PLAT-14986][PLAT-14998][PLAT-15003] - ui improvements and fixes c149f26 [#23556] hnsw_tool command-line tool for testing HNSW index implementations 7725f15 Add operator mode & task info to diagnostics e193fc6 Revert "[#23064] YSQL: pg_partman: disable p_retention_schema parameter" 8178372 [#23513] YSQL: Simplify several functions in ybc_pggate 22657da [#23394] CDCSDK: Prevent tserver crash on concurrent Getchanges call on same producer tablet 90554b0 [#23179] CDCSDK: Refactor TestPgReplicationSlot for dynamic data types Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Differential Revision: https://phorge.dev.yugabyte.com/D37453
Jira Link: DB-12439
Description
We found this from stress test:
Symptom
Leader has accepted the retryable request but when replicating it to followers, the same request was rejected by followers due to
Duplicate request
error. It can have two different symptoms:Duplicate request 2881287
get cleaned.Mitigation
This issue should go away when the duplicate requests get cleaned after reaching the retryable request timeout (600secs for ysql and 60s for ycql).
RCA
In progress
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: