Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Investigate why the first request picks a local limit only when the read time is not set. #22158

Closed
1 task done
pao214 opened this issue Apr 25, 2024 · 1 comment
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pao214
Copy link
Contributor

pao214 commented Apr 25, 2024

Jira Link: DB-11085

Description

Context

Read time is either picked on docdb

Status ReadQuery::DoPickReadTime(server::Clock* clock) {
...
  const auto read_time_was_empty = !read_time_;
  if (read_time_was_empty) {
    safe_ht_to_read_ = VERIFY_RESULT(abstract_tablet_->SafeTime(require_lease_));
    // If the read time is not specified, then it is a single-shard read.
    // So we should restart it in server in case of failure.
    read_time_.read = safe_ht_to_read_;
    if (transactional()) {
      read_time_.global_limit = clock->MaxGlobalNow();
      read_time_.local_limit = std::min(safe_ht_to_read_, read_time_.global_limit);

      VLOG(1) << "Read time: " << read_time_.ToString();
    } else {
      read_time_.local_limit = read_time_.read;
      read_time_.global_limit = read_time_.read;
    }

or it is provided to docdb

Status ReadQuery::DoPickReadTime(server::Clock* clock) {
...
  const auto read_time_was_empty = !read_time_;
  if (read_time_was_empty) {
...
  } else {
    ...
    safe_ht_to_read_ =
        (current_safe_time > read_time_.read
             ? current_safe_time
             : VERIFY_RESULT(abstract_tablet_->SafeTime(
                   require_lease_, read_time_.read, context_.GetClientDeadline())));
  }

In the latter case, the local limit is set only for the second RPC to the tablet as part of the txn.

Status ReadQuery::Complete() {
...
    const auto result = VERIFY_RESULT(DoRead());
...
    read_time_ = result;
    // If read was successful, then restart time is invalid. Finishing.
    // (If a read restart was requested, then read_time would be set to the time at which we have
    // to restart.)
    if (!read_time_) {
      // allow_retry means that the read time was not set in the request and therefore we can
      // retry read restarts on the tablet server.
      if (!allow_retry_) {
        auto local_limit = std::min(safe_ht_to_read_, used_read_time_.global_limit);
        resp_->set_local_limit_ht(local_limit.ToUint64());
      }
      break;
    }

This is stated in c784595.

It starts off as global_limit for the first request to a tablet as part of the transaction, but then. for second and later requests to that tablet, is set to the safe time on that tablet returned to the YQL engine by the response to the first request.

but no reason was given for why local limit is not used for the first request. Moreover, this claim is not even accurate when the read time is picked on the tablet.

Motivation

local limit is most useful when the original time of the scanned intent is higher than this limit. Then, we can skip checking the uncertainty window because we have just established a causal relation: the read operation happens before the provisional write op. Similar logic applies to fast path writes that bypass the provisional write mechanism.

This local limit logic should be independent of whether the read time is picked on the tablet or provided to the tablet.

NOTE: The local limit is still picked as safe time of the tablet and not provided to docdb. So, the causal relation still holds.

Scenario 1

  • Read RPC arrived at tablet 1 from conn1 with a read time and no local_limit (first RPC).
  • A provisional intent from conn2 is replicated after the read RPC arrived.
  • This provisional intent is applied to regular docdb with intent_ht as the hybrid time of the provisional write.
  • The read RPC observed the regular record but no local limit was set. This leads to a read restart error.

The read clearly happened before the provisional write is raft-replicated. The operations are concurrent, implying no uncertainty.

Scenario 2

  • Read RPC arrived at tablet 1 from conn1 with a read time and no local_limit (first RPC).
  • Fast path insert happened from conn2 at tablet 1 after the read RPC arrived at tablet 1.
  • The read RPC observed the fast path insert and no local limit was set. This leads to a read restart error.

The read clearly happened before the write operation and there is no "real" uncertainty.

Objective

  1. Add a test case to guard against the above scenarios raising read restarts.
  2. Set local time irrespective of whether the read time is picked on the tablet or not.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pao214 pao214 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 25, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 25, 2024
@pao214 pao214 moved this to In Review in Wait-Queue Based Locking Jun 10, 2024
pao214 added a commit that referenced this issue Jul 17, 2024
…s already set.

Summary:
### Motivation

Setting the local limit to current safe time will help prune writes that arrived after the read RPC arrived at a tablet.

### Impact

For a thorough overview of when the read time is provided to DocDB see pg_read_time-test.cc.

For a quick summary, the read time may be set before the read RPC arrives at a storage layer node when

1. This a fanout read such as a COUNT(*) query.
2. This is a multi-tablet read and the read time was already picked on some remote storage layer node.
3. This is a REPEATABLE READ transaction, and the read time has already been picked in a previous statement but we are scanning a different tablet.
4. Potentially other scenarios ...
Jira: DB-11085

Test Plan:
Jenkins

```
./yb_build.sh --cxx-test pg_local_limit_optimization-test --gtest_filter PgLocalLimitOptimizationTest.SinglePageScan
./yb_build.sh --cxx-test pg_local_limit_optimization-test --gtest_filter PgLocalLimitOptimizationTest.ReadTimePickedOnLocalProxy
./yb_build.sh --cxx-test pg_local_limit_optimization-test --gtest_filter PgLocalLimitOptimizationTest.ReadTimePickedOnRemoteNode
./yb_build.sh --cxx-test pg_local_limit_optimization-test --gtest_filter PgLocalLimitOptimizationTest.ReadTimePickedBeforeTableScan
```

Backport-through: 2.20

Reviewers: pjain, sergei

Reviewed By: pjain, sergei

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34561
jasonyb pushed a commit that referenced this issue Jul 17, 2024
Summary:
 0c8e378 [#23216]yugabyted: Adding required models for source db details and target recommendation details sub pages.
 133ff1c [#23210] yugabyted: yugabyted collect_logs fails to gather logs when yugabyted is not running.
 08ca9dd [#23196] DocDB: Handle preview flags in ValidateFlagValue
 13c0ced [#23215] docdb: disable packed for colocated tables by default
 6146084 [doc][ybm] Reorg to Aeon authentication pages (#23226)
 Excluded: 56d2d2d [#16408] YSQL: add gflag TEST_generate_ybrowid_sequentially
 c8e7530 [PLAT-14672] getSessionInfo should return a valid api token
 3abd045 [#22594] YSQL: Fix flaky TestTransactionStatusTable.testCreation unit test
 d67ba12 [#22158] YSQL: Set local limit as safe time even when the read time is already set.
 1315a10 [docs] PG compatible logical replication architecture (#23220)
 88e92a0 [#23216] yugabyted: Adding a new field ObjectName to model SqlObjectMetadata.

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36661
@pao214 pao214 moved this from In Review to Backporting in Wait-Queue Based Locking Sep 4, 2024
@pao214
Copy link
Contributor Author

pao214 commented Sep 25, 2024

We do not plan to backport this change to avoid any unintentional changes in previous branches. Feel free to re-open if there is a valid use-case to backport.

@pao214 pao214 closed this as completed Sep 25, 2024
@github-project-automation github-project-automation bot moved this from Backporting to Done in Wait-Queue Based Locking Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

3 participants