Broken Connection Error in Odyssey when reset phase query takes too long to execute #22849

yugabyte-ci · 2024-06-13T09:26:59Z

…nnection Manager Summary: **Problem Statement:** While a logical client detaches from its corresponding server connection, Connection Manager fires certain RESET queries before allowing the server connection to return back to the pool of usable connections for other logical clients to use. The upstream code has made it such that if these RESET queries take longer than 1 second to execute and return cumulatively, then odyssey breaks the connection and forwards its own error to the client with the message `Resource temporarily unavailable`. It is to be noted that there were no errors while executing the client's workload, and that this is an error formed by the Connection Manager layer itself. **Solution/Changes Made:** Since there were technically no errors due to the client's workload, it makes no sense to forward any error to the logical client - we can instead silently close the server connection when it is found that we hit the reset timeout during the reset phase. Additionally, increase the reset timeout in this diff as the client's workload is already pipelined and is not dependent on the server's reset phase completing. It will help save some cycles internally if we can re-use the same server connection. We modify a function, `od_backend_query` to return a different status when the reset timeout is hit, which can be used for any "reset" query that is fired at the time of detaching the logical and physical connection. Some examples of this can be seen with the help of a small excerpt from the conn mgr template file, picked up from pool config settings: ``` pool_discard no pool_rollback yes ``` Each of these values correspond to what should be done to the server connection with respect to "resetting" the context on it as per the user's wish - `discard` fires a "discard all" query - `rollback` fires a "rollback" if it is found that the server is in an active transaction before firing reset statements - By default, we fire a query that contains "reset all" "reset role" and "reset session authorization" All of the above scenarios use the same function being modified to return a different status depending on the reset type Jira: DB-11746 Test Plan: Jenkins: enable connection manager, all tests Reviewers: skumar, asrinivasan Reviewed By: asrinivasan Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D35564

Summary: f8e73e9 [#18192] YSQL: Introduce interface to mock tserver response in MiniCluster tests 4ae68f4 Build break fix for centos7 Excluded: 2ec9224 [#23033] Allow running YSQL upgrade unit tests with snapshot other than 2.0.9.0 37912f1 [#22058] docdb: Disable connections on cloned db until cloning is complete 059b855 [#22908] xCluster: Use XClusterRemoteClient across XCluster 5dc5ee7 [#22849] YSQL: Correctly handle reset phase timeout errors in YSQL Connection Manager af49a1e [#22876][#22835][#22773] CDCSDK: Add new auto flag to identify non-eligible tables in CDC stream f3c4e14 [PLAT-14524] Up-version pekko to fix TLSActor infinite loop 9388aea [#23052] yugabyted: Restarting a node fails when data_dir is missing in user specified configuration. 5cf9736 [PLAT-12685]: Generate a YBA metric for xcluster config table status. 73fc90a [PLAT-14497]: Fix incremental backup time when none full backup exists e9b5ba5 [PLAT-14533]: Modify the gflags metadata support db version check 8dca952 [PLAT-14432][Platform] Show certificate Database Node Certificate/key and Client Certificate/key for CA certs in certificate details modal 6551e45 Add utkarsh.munjal to contributors.md bafa1cb [#21751] YSQL, ASH: Sampling of wait events Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36325

yugabyte-ci added area/ecosystem Label for all ecosystem related projects jira-originated kind/new-feature This is a request for a completely new feature priority/highest Highest priority issue labels Jun 13, 2024

yugabyte-ci assigned rahulb-yb Jun 13, 2024

yugabyte-ci closed this as completed Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken Connection Error in Odyssey when reset phase query takes too long to execute #22849

Broken Connection Error in Odyssey when reset phase query takes too long to execute #22849

yugabyte-ci commented Jun 13, 2024 •

edited by jira bot

Loading

Broken Connection Error in Odyssey when reset phase query takes too long to execute #22849

Broken Connection Error in Odyssey when reset phase query takes too long to execute #22849

Comments

yugabyte-ci commented Jun 13, 2024 • edited by jira bot Loading

yugabyte-ci commented Jun 13, 2024 •

edited by jira bot

Loading