Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logictest: protected timestamp record does not exist #92493

Closed
cockroach-teamcity opened this issue Nov 25, 2022 · 2 comments · Fixed by #92586
Closed

logictest: protected timestamp record does not exist #92493

cockroach-teamcity opened this issue Nov 25, 2022 · 2 comments · Fixed by #92586
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 25, 2022

pkg/sql/logictest/tests/fakedist/fakedist_test.TestLogic_alter_primary_key failed with artifacts on master @ b5be006bedd7d3cedc3fb3d2248df168e3d64be2:

=== RUN   TestLogic_alter_primary_key
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/d549810e45034f43a9494d02b085e623/logTestLogic_alter_primary_key1996763357
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestLogic_alter_primary_key
    logic.go:4129: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/d549810e45034f43a9494d02b085e623/logTestLogic_alter_primary_key1996763357
--- FAIL: TestLogic_alter_primary_key (7.53s)
=== RUN   TestLogic_alter_primary_key/index_rewrites
    logic.go:3182: 
        
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2386/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/fakedist/fakedist_test_/fakedist_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/alter_primary_key:258: 
        expected success, but found
        (XXUUU) job 816917427264749569: protected timestamp record does not exist
        protectedts.go:31: in init()
    logic.go:2144: 
         pq: job 816917427264749569: protected timestamp record does not exist
[10:56:54] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2386/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/fakedist/fakedist_test_/fakedist_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/alter_primary_key with config fakedist: 60 tests, 2 failures
[10:56:55] --- total progress: 60 statements
--- total: 60 tests, 2 failures
    --- FAIL: TestLogic_alter_primary_key/index_rewrites (0.92s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-21810

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Nov 25, 2022
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Nov 25, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 25, 2022
@stevendanna
Copy link
Collaborator

I've hit this in a bors run as well. It reproduces very quickly for me under stress.

@fqazi fqazi self-assigned this Nov 28, 2022
fqazi added a commit to fqazi/cockroach that referenced this issue Nov 28, 2022
Fixes: cockroachdb#92493

Previously, we added the protected timestamps manager into the jobs
frame work, which made it easier to automatically add and remove
protected timestamps for jobs. Unfortunately, the Unprotect API
when invoked from a clean up function never properly refreshed
the job. Which could lead to a race condition trying to remove
protected timestamps for schema changes. To address this, the
Unprotect function will take advantage of the job update function
to confirm that a refreshed copy does have protected timestamps
set.

Release note: None
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Nov 28, 2022
@yuzefovich yuzefovich removed the T-sql-queries SQL Queries Team label Nov 28, 2022
craig bot pushed a commit that referenced this issue Nov 28, 2022
92466: settings,spanconfig: introduce a protobuf setting type r=irfansharif a=irfansharif

For this setting type:
- the `protoutil.Message` is held in memory,
- the byte representation is stored in `system.settings`, and
- the json representation is used when accepting input and rendering state (through `SHOW CLUSTER SETTING <setting-name>`, the raw form is visible when looking directly at `system.settings`)

We also use this setting type to support power a `spanconfig.store.fallback_config_override`, which overrides the fallback config used for ranges with no explicit span configs set. Previously we used a hardcoded value -- this makes it a bit more configurable. This is a partial and backportable workaround (read: hack) for #91238 and \#91239. In an internal escalation we observed "orphaned" ranges from dropped tables that were not being being referenced by span configs (by virtue of them originating from now-dropped tables/configs). Typically ranges of this sort are short-lived, they're emptied out through GC and merged into LHS neighbors. But if the neighboring ranges are large enough, or load just high enough, the merge does not occur. For such orphaned ranges we were using a hardcoded "fallback config", with a replication factor of three. This made for confusing semantics where if `RANGE DEFAULT` was configured to have a replication factor of five, our replication reports indicated there were under-replicated ranges. This is partly because replication reports today are powered by zone configs (thus looking at `RANGE DEFAULT`) -- this will change shortly as part of \#89987 where we'll instead consider span config data. In any case, we were warning users of under-replicated ranges but within KV we were not taking any action to upreplicate them -- KV was respecting the hard-coded fallback config. The issues above describe that we should apply each tenant's `RANGE DEFAULT` config to all such orphaned ranges, which is probably the right fix. This was alluded to in an earlier TODO but is still left for future work.

```go
  // TODO(irfansharif): We're using a static[1] fallback span config
  // here, we could instead have this directly track the host tenant's
  // RANGE DEFAULT config, or go a step further and use the tenant's own
  // RANGE DEFAULT instead if the key is within the tenant's keyspace.
  // We'd have to thread that through the KVAccessor interface by
  // reserving special keys for these default configs.
  //
  // [1]: Modulo the private spanconfig.store.fallback_config_override, which
  //      applies globally.
```

So this PR instead takes a shortcut -- it makes the static config configurable through a cluster setting. We can now do the following which alters what fallback config is applied to orphaned ranges, and in our example above, force such ranges to also have a replication factor of five.

```sql
  SET CLUSTER SETTING spanconfig.store.fallback_config_override = '
    {
      "gcPolicy": {"ttlSeconds": 3600},
      "numReplicas": 5,
      "rangeMaxBytes": "536870912",
      "rangeMinBytes": "134217728"
    }';
```

Release note: None


92585: roachtest: reduce frequency of index-overload test r=irfansharif a=irfansharif

Release note: None

92586: jobs: refresh job details before removing protected timestamps r=fqazi a=fqazi

Fixes: #92493

Previously, we added the protected timestamps manager into the jobs frame work, which made it easier to automatically add and remove protected timestamps for jobs. Unfortunately, the Unprotect API when invoked from a clean up function never properly refreshed the job. Which could lead to a race condition trying to remove protected timestamps for schema changes. To address this, the Unprotect function will take advantage of the job update function to confirm that a refreshed copy does have protected timestamps set.

Release note: None

92593: logictest: avoid a lint failure r=andreimatei a=knz

Fixes #92592

The go compiler treats calls to `bazel.BuiltWithBazel()` as a
compile-time constant. Therefore,

```go
if !bazel.BuiltWithBazel() {
   skip.XXX()
}
```

is considered to always terminate execution (because `skip` does its
job by raising a panic), and any code coming after that is treated as
dead/unreachable.

92599: multitenant: merge serverpb.RegionsServer into serverpb.TenantStatusServer r=knz a=ecwall

Fixes #92598

Merge these 2 interfaces in anticipation for future work where more tenant functionality will be added to TenantStatusServer instead of creating an interface per function.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
@craig craig bot closed this as completed in 02867ed Nov 28, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants