-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Intensive workload with parallel UPDATE/DELETE all rows leads to restarting Postgres process #11349
Comments
Might be that the client doesn't retry certain types of errors (at all or enough). Will talk to Timur next week. |
Full logs from recent quick run. |
…nd ConsensusRound callbacks invokation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: sergei Reviewed By: sergei Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D15484
…g at follower and ConsensusRound callbacks invocation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Original commit: d26017d / D15484 Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: jmeehan Reviewed By: jmeehan Subscribers: jmeehan, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D17854
…ing at follower and ConsensusRound callbacks invokation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Original commit: d26017d / D15484 Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: amitanand Reviewed By: amitanand Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18036
… at follower and ConsensusRound callbacks invocation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Original commit: d26017d / D15484 Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: amitanand Reviewed By: amitanand Subscribers: ybase, bogdan, jmeehan Differential Revision: https://phabricator.dev.yugabyte.com/D18127
…ing at follower and ConsensusRound callbacks invocation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Original commit: d26017d / D15484 Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: mbautin, amitanand Reviewed By: amitanand Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18116
…ing at follower and ConsensusRound callbacks invocation Summary: **Background** Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. Request ID is assigned by `YBClient` based on `TabletRequests::request_id_seq` and TServer tracks running requests IDs in `RetryableRequests` structure. When we split tablet, each of its child tablets gets copy of its Raft log and also `RetryableRequests` structure. On `YBClient` side, child tablets get `TabletRequests::request_id_seq` value from the parent tablet to be consistent with TServer side. But if process hosting `YBClient` has been restarted / partioned away or simply not issuing request to specific tablets during sequence of split, it might have no information about particular tablet and doesn't know about `TabletRequests::request_id_seq` for the parent tablet, so can't start from it for child tablets. This case was addressed by D9264/f9cae11ced426476d4f70560b2afc29f921a1737. If `YBClient` (`MetaCache`) doesn't know about the parent tablet, it use special value as `request_id_seq` for child tablet. Then, on getting "request id is less than min" error - it sets `request_id_seq` to min running request ID from the server plus 2^24 (there wouldn't be 2^24 client requests in progress from the same client to the same tablet, so it is safe to do this). **Problem** On TServer side `MaintenanceManager` periodically triggers `RetryableRequests` structure cleanup. And due to overloading or having too much tablets, it can happen that at tablet leader `RetryableRequests` structure is already cleaned up, but at the follower is not yet. This can lead to the following failure scenario: 1) Tablet `T` is split into `T1` and `T2`. 1) `YBClient` A starts doing some writes to tablet `T1`. Since A doesn't know about tablet T, it sets `request_id_seq` to `min_running_request_id` for A from the T leader (0) plus 1<<24. After 100 writes it becomes `100 + 1 << 24`. 3) Tablet `T1` is split into `T1.1` and `T1.2` 4) Tablet `T1.1` is split into `T1.1.1` and `T1.1.2` 5) `RetryableRequests` cleanup is done at the leader of `T1.1`, but not at followers. 6) `YBClient` A starts doing write to tablet `T1.1.2`. Since A doesn't know about tablet `T1.1`, it sets `request_id_seq` to `min_running_request_id` for A from `T1.1` leader (0 due to cleanup) plus `1<<24`, that is `1<<24`. 7) `T1.1` leader accepts write with `request_id = 1 <<24` and tries to replicate it at followers. 8) At follower, `min_running_request_id` is `100 + 1 << 24` (inherited from parent tablets). So, follower rejects write request that is already accepted by leader. Error message is: "Request id 16777316 from client ... is less than min running 16777216". **Solution** The solution is to update `ReplicaState::AddPendingOperation` to make it run cleanup at the follower in case of getting error when trying to register retryable request ID and try to register again. Also, some other fixes/improvements have been made: - Updated `RetryableRequests::Register` to return error instead of invoking callback on error by itself. Callers are responsible for invoking callback if needed. - Updated `RaftConsensus::StartConsensusOnlyRoundUnlocked` to invoke callback in case of error. - Updated `YBClient` logging to have prefix with both client ID and `MetaCache` address. - Added sanity check to avoid duplicate `ConsensusRoundCallback` calls and to verify that callback is always got called before destruction. - Updated `RaftConsensus::AppendNewRoundsToQueueUnlocked` to reduce code duplication and made it responsible for invoking callback for processed rounds. - Refactored `OperationDriver::Init` to exit early in case of error. - Enhanced `CatalogManager::RegisterNewTabletForSplit` logging. - Fixed `OperationTracker::Add` null pointer handling. - Added test. Original commit: d26017d / D15484 Test Plan: TabletSplitITest.SplitClientRequestsClean Reviewers: amitanand Reviewed By: amitanand Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18150
Summary: Since D5660/023c20ad6caf13e5228081ddec07d9f3cf904348 we track Write RPC request ID to prevent duplicate writes caused by retries of Write RPCs. When we split tablet, each of its child tablets gets copy of its Raft log and also But if process hosting Problem On TServer side
Solution The solution is to update Also, some other fixes/improvements have been made:
|
Without dynamic tablet splitting enabled it can also happen that |
Jira Link: DB-6047
Description
While running sample-app SqlDataLoad with additional operations (DELETE all presented rows and UPDATE all presented rows, see code in the attachments) few FATALs started to appear in client code. tserver and master logs does not contain any failures and looks like it's Postgres process that failed.
Workload w/o additional operations doesn't contain these exceptions
Client log:
Here is one of Postgres log files: postgres-log.txt
Reproducer:
Example of run code
SqlDataLoadWithDDL.java.zip
The text was updated successfully, but these errors were encountered: