Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TestRaceWithBackfill failed #104655

Closed
cockroach-teamcity opened this issue Jun 9, 2023 · 5 comments · Fixed by #105069
Closed

sql: TestRaceWithBackfill failed #104655

cockroach-teamcity opened this issue Jun 9, 2023 · 5 comments · Fixed by #105069
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-transactions Relating to MVCC and the transactional model. A-testing Testing tools and infrastructure branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 9, 2023

sql.TestRaceWithBackfill failed with artifacts on release-22.2 @ f38ef200f5f1d9f87fba0d5f2956b7fdd7c539b1:

=== RUN   TestRaceWithBackfill
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/b287de7194e17a0386ee1775f0d198ba/logTestRaceWithBackfill1910564040
    test_log_scope.go:79: use -show-logs to present logs inline
    schema_changer_test.go:337: schema change ALTER TABLE t.test ADD COLUMN x DECIMAL DEFAULT (DECIMAL '1.4') CHECK (x >= 0) took 7.539652455s
    schema_changer_test.go:337: schema change ALTER TABLE t.test DROP pi took 3.96684779s
    schema_changer_test.go:337: schema change CREATE UNIQUE INDEX foo ON t.test (v) took 12.97907424s
    schema_changer_test.go:337: schema change CREATE INDEX bar ON t.test(k) STORING (v) took 9.572502337s
    schema_changer_test.go:417: pq: scrub-index: batch timestamp 1686301951.849521007,0 must be after replica GC threshold 1686301963.443382897,0
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/b287de7194e17a0386ee1775f0d198ba/logTestRaceWithBackfill1910564040
--- FAIL: TestRaceWithBackfill (122.29s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-28655

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jun 9, 2023
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Jun 9, 2023
@rafiss rafiss removed the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Jun 13, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 13, 2023
@rharding6373
Copy link
Collaborator

The index scrub query is issued as an internal transaction and looks something like:

//	SELECT pri.k  pri.l, pri.a, pri.b,
//	       sec.k, sec.l, sec.a, sec.b
//	FROM
//	  (SELECT k, l, a, b FROM [tbl_id AS table_pri]@{FORCE_INDEX=[pri_idx_id]}) AS pri
//	FULL OUTER JOIN
//	  (SELECT k, l, a, b FROM [tbl_id AS table_sec]@{FORCE_INDEX=[idx_id]} AS sec
//	ON
//	  pri.k = sec.k AND
//	  pri.l = sec.l AND
//	  pri.a IS NOT DISTINCT FROM sec.a AND
//	  pri.b IS NOT DISTINCT FROM sec.b
//	WHERE
//	  pri.k IS NULL OR sec.k IS NUL

where k,l are columns in the primary key and a, b are columns in an index. This does a full scan of both the primary index and the secondary index.

From comments in kv, it looks like this error comes up if a request's read timestamp is below the range's GC threshold. I think KV will need to look into why it was unable to produce a good read timestamp and advise if there are ways that we can make this query more resilient.

@blathers-crl blathers-crl bot added the T-kv KV Team label Jun 13, 2023
@rharding6373 rharding6373 removed their assignment Jun 13, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Jun 16, 2023

@rharding6373 Do you happen to have the logs/artifacts? They are no longer present in TeamCity.

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv Anything in KV that doesn't belong in a more specific category. A-kv-transactions Relating to MVCC and the transactional model. labels Jun 16, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Jun 16, 2023

Batch timestamp is Friday, 9 June 2023 09:12:31.849 GMT, which is ~12s before the replica GC threshold Friday, 9 June 2023 09:12:43.443 GMT.

I wonder why the GC threshold was not like 24h ago, but almost "right now" while the test was running. Probably because of this:

// Add a zone config for the table so that garbage collection happens rapidly.
if _, err := sqltestutils.AddImmediateGCZoneConfig(sqlDB, tableDesc.GetID()); err != nil {

The AddImmediateGCZoneConfig func comment says it must be used with disabled GC TTL enforcement, which we don't seem to do in this test. It is used in other tests, like this:

// Disable strict GC TTL enforcement because we're going to shove a zero-value
// TTL into the system with AddImmediateGCZoneConfig.
defer sqltestutils.DisableGCTTLStrictEnforcement(t, sqlDB)()

Fix should probably be as simple as adding this call.

@pav-kv
Copy link
Collaborator

pav-kv commented Jun 16, 2023

Returning this back to SQL as this seems like a bug in the test.

@pav-kv pav-kv added the A-testing Testing tools and infrastructure label Jun 16, 2023
@pav-kv pav-kv removed the T-kv KV Team label Jun 16, 2023
@rharding6373
Copy link
Collaborator

Thanks @pavelkalinnikov for the pointers! I'll put a fix out now.

craig bot pushed a commit that referenced this issue Jun 20, 2023
104763: sql: rationalize the handling of comments and improve performance r=rafiss a=rafiss

rebase of #95431
fixes #102851

Prior to this patch, we had a weird translation going in
`kv_catalog_comments`, which was causing the tree of objects to be
traversed 6 times (one per comment type), and then the rows to be
mapped one by one again in `pg_description` / `pg_shdescription`.

This made no sense - it should be possible to push a predicate on
`pg_description` down to a virtual index on `kv_catalog_comments`, and
this requires the column types to align and no computation to happen
in-between.

This commit simplifies all this by having `kv_catalog_comments`
responsible for preparing the data in the format expected by the `pg_`
tables.

The number of roundtrips needed for an ORM query that uses
pg_description dropped to 4, from 134.

```
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
                                             │ /tmp/tmp.0yUMg6OauX/bench.6abe1851d83 │ /tmp/tmp.0yUMg6OauX/bench.89a494eb3010b945ae5a2be7a057e50802b231c0 │
                                             │                sec/op                 │                   sec/op                    vs base                │
ORMQueries/introspection_description_join-24                           10687.1m ± 1%                                  144.1m ± 8%  -98.65% (p=0.000 n=10)

                                             │ /tmp/tmp.0yUMg6OauX/bench.6abe1851d83 │ /tmp/tmp.0yUMg6OauX/bench.89a494eb3010b945ae5a2be7a057e50802b231c0 │
                                             │              roundtrips               │                 roundtrips                  vs base                │
ORMQueries/introspection_description_join-24                            134.000 ± 0%                                   4.000 ± 0%  -97.01% (p=0.000 n=10)

                                             │ /tmp/tmp.0yUMg6OauX/bench.6abe1851d83 │ /tmp/tmp.0yUMg6OauX/bench.89a494eb3010b945ae5a2be7a057e50802b231c0 │
                                             │                 B/op                  │                    B/op                     vs base                │
ORMQueries/introspection_description_join-24                          3846.69Mi ± 0%                                 38.98Mi ± 6%  -98.99% (p=0.000 n=10)

                                             │ /tmp/tmp.0yUMg6OauX/bench.6abe1851d83 │ /tmp/tmp.0yUMg6OauX/bench.89a494eb3010b945ae5a2be7a057e50802b231c0 │
                                             │               allocs/op               │                 allocs/op                   vs base                │
ORMQueries/introspection_description_join-24                           45566.9k ± 0%                                  433.3k ± 6%  -99.05% (p=0.000 n=10)
```

Release note: None

105069: sql: disable strict gc ttl enforcement in schema changer test r=rharding6373 a=rharding6373

Before this change, the TestRaceWithBackfill test would add a zone config to force garbage collection to happen frequently, however this could cause a race between batch timestamps and garbage collection thresholds. This change adds a call to `DisableGCTTLStrictEnforcement`, which is supposed to be called before updating the zone config with `AddImmediateGCZoneConfig` for testing purposes.

Epic: None
Fixes: #104655

Release note: None

105156: roachtest: fix tpchvec in some cases r=yuzefovich a=yuzefovich

We recently introduced a minor bug in the tpchvec in c5c7ead where could use uninitialized helper to parse the test output, which can lead to a nil pointer when the slowness threshold is exceeded, and this is now fixed.

Epic: None

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in c602190 Jun 20, 2023
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-transactions Relating to MVCC and the transactional model. A-testing Testing tools and infrastructure branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants