Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/sql/gcjob_test/gcjob_test_test: TestGCJobRetry failed #110447

Closed
cockroach-teamcity opened this issue Sep 12, 2023 · 8 comments
Closed

pkg/sql/gcjob_test/gcjob_test_test: TestGCJobRetry failed #110447

cockroach-teamcity opened this issue Sep 12, 2023 · 8 comments
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 12, 2023

pkg/sql/gcjob_test/gcjob_test_test.TestGCJobRetry failed with artifacts on release-23.1 @ ef138f991b2e1f3707e0ad8f7294cbcaf686b3f7:

=== RUN   TestGCJobRetry
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/4d64db87b2aa0e9fccbce189bc4b8745/logTestGCJobRetry263342383
    test_log_scope.go:79: use -show-logs to present logs inline
    sql_runner.go:104: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:117
        	            				github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:312
        	            				pkg/sql/gcjob_test/gcjob_test_test_test/pkg/sql/gcjob_test/gc_job_test.go:321
        	Error:      	Received unexpected error:
        	            	query 'SELECT running_status FROM crdb_internal.jobs WHERE job_id = 899351928106287105': expected:
        	            	(1) attached stack trace
        	            	  -- stack trace:
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).CheckQueryResultsRetry.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:315
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/soon.go:71
        	            	  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
        	            	  | 	github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:213
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/soon.go:77
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).succeedsWithin
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:117
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).CheckQueryResultsRetry
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:312
        	            	  | pkg/sql/gcjob_test/gcjob_test_test_test.TestGCJobRetry
        	            	  | 	pkg/sql/gcjob_test/gcjob_test_test_test/pkg/sql/gcjob_test/gc_job_test.go:321
        	            	  | testing.tRunner
        	            	  | 	GOROOT/src/testing/testing.go:1446
        	            	  | runtime.goexit
        	            	  | 	GOROOT/src/runtime/asm_amd64.s:1594
        	            	Wraps: (2) query 'SELECT running_status FROM crdb_internal.jobs WHERE job_id = 899351928106287105': expected:
        	            	  | waiting for MVCC GC
        	            	  | got:
        	            	  | NULL
        	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/4d64db87b2aa0e9fccbce189bc4b8745/logTestGCJobRetry263342383
--- FAIL: TestGCJobRetry (54.70s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-31432

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Sep 12, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Sep 12, 2023
@rafiss
Copy link
Collaborator

rafiss commented Sep 12, 2023

Let's fix this by modifying the SucceedsSoonDuration for CheckQueryResultsRetry.

@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 13, 2023
@cockroach-teamcity
Copy link
Member Author

pkg/sql/gcjob_test/gcjob_test_test.TestGCJobRetry failed with artifacts on release-23.1 @ b08ec859debd7530b48f71405a700961596b3563:

=== RUN   TestGCJobRetry
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/4d64db87b2aa0e9fccbce189bc4b8745/logTestGCJobRetry3074162055
    test_log_scope.go:79: use -show-logs to present logs inline
    sql_runner.go:105: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:118
        	            				github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:331
        	            				pkg/sql/gcjob_test/gcjob_test_test_test/pkg/sql/gcjob_test/gc_job_test.go:321
        	Error:      	Received unexpected error:
        	            	query 'SELECT running_status FROM crdb_internal.jobs WHERE job_id = 903314597558714369': expected:
        	            	(1) attached stack trace
        	            	  -- stack trace:
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).CheckQueryResultsRetry.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:334
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/soon.go:71
        	            	  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
        	            	  | 	github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:213
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/soon.go:77
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).succeedsWithin
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:118
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).CheckQueryResultsRetry
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:331
        	            	  | pkg/sql/gcjob_test/gcjob_test_test_test.TestGCJobRetry
        	            	  | 	pkg/sql/gcjob_test/gcjob_test_test_test/pkg/sql/gcjob_test/gc_job_test.go:321
        	            	  | testing.tRunner
        	            	  | 	GOROOT/src/testing/testing.go:1446
        	            	  | runtime.goexit
        	            	  | 	GOROOT/src/runtime/asm_amd64.s:1594
        	            	Wraps: (2) query 'SELECT running_status FROM crdb_internal.jobs WHERE job_id = 903314597558714369': expected:
        	            	  | waiting for MVCC GC
        	            	  | got:
        	            	  | NULL
        	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/4d64db87b2aa0e9fccbce189bc4b8745/logTestGCJobRetry3074162055
--- FAIL: TestGCJobRetry (56.66s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@andyyang890
Copy link
Collaborator

andyyang890 commented Sep 27, 2023

In the test logs for the most recent failure, I see this warning:

W230926 14:55:40.011849 19214910 sql/crdb_internal.go:1214 â‹® [T1,n1,client=127.0.0.1:60110,hostssl,user=root] 224  error closing an iterator: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1695740139.884269752,0 encountered previous write with future timestamp 1695740139.884269752,1 (local=1695740139.852676344,1) within uncertainty interval `t <= (local=1695740139.884269752,0, global=1695740140.384269752,0)`; observed timestamps: [{1 1695740139.884269752,0}]: "sql txn" meta={id=09646fed key=/Min pri=0.00723680 epo=0 ts=1695740139.884269752,0 min=1695740139.884269752,0 seq=0} lock=false stat=PENDING rts=1695740139.884269752,0 wto=false gul=1695740140.384269752,0

This error comes from makeJobsTableRows, which is used to populate the crdb_internal.jobs virtual table:

if err := it.Close(); err != nil {
// TODO(yuzefovich): this error should be propagated further up
// and not simply being logged. Fix it (#61123).
//
// Doing that as a return parameter would require changes to
// `planNode.Close` signature which is a bit annoying. One other
// possible solution is to panic here and catch the error
// somewhere.
log.Warningf(ctx, "error closing an iterator: %v", err)
}

I'm guessing this explains why we don't see a row for it, which leads to the test failing.

It also seems somewhat problematic to me that when we fail to populate a virtual table, it just logs a warning and nothing else seems to happen (i.e. it doesn't retry the populate)? (Edit: Ah I see, #61123 is already filed for this.)

@yuzefovich
Copy link
Member

Seems like #107285 fixed a similar issue - are we missing a backport?

@andyyang890
Copy link
Collaborator

Thanks for the pointer! I'm not able to reproduce this under stress, but I'll try backporting that PR and seeing if that stabilizes things.

@rimadeodhar
Copy link
Collaborator

@andyyang890 seems like we can close this out now that the backport is merged or are you still waiting to confirm the backport has fixed the test?

@andyyang890
Copy link
Collaborator

Yeah, ideally I'd like to confirm that the backport has fixed the test, but considering it's such a rare flake, I'm not sure how long we'd have to wait to be confident that it's fixed. I'd be okay with closing it if you think that's fine.

@rimadeodhar
Copy link
Collaborator

Yeah, lets close it in that case. If it flakes again, it will open up a new issue.

craig bot pushed a commit that referenced this issue Oct 5, 2023
110930: gcjob_test: add more logging to TestGCJobRetry r=rafiss a=andyyang890

This patch adds more logging to `TestGCJobRetry` to help debug
occasional flakes.

Informs: #110447

Release note: None

111510: api: increase timeout to request execution details r=maryliag a=adityamaru

In large clusters requesting execution details can definitely take longer than 5 seconds. This is because it involves collecting cluster wide traces, goroutines and contacting the coordinator node of the job to dump its execution details. Since this is a debug only tool we bump the timeout to 300s to give it all the time it needs.

Epic: none
Release note: None

111701: sql: add tests for privileges for statements in udfs r=rharding6373 a=rharding6373

This PR adds test coverage for privileges in UDFs, e.g., SELECT and INSERT privileges.

Epic: CRDB-25388
Informs: #87289

Release note: None

111704: Roachtest azure nightly r=healthy-pod,herkolategan a=smg260

These are a series of commits to enable roachtests to run on Azure in TeamCity. 

1. Add the relevant teamcity invoke script
2. Update authentication to look in CLI or environment for dev and TC respectively
3. Look for a default subscription in the environment, with fallback to existing "pick first" implementation
4. Add a security rule to allow roachtest host machine to connect to a vm via kafka admin
5. Update azure default location to one with more quota and `apt-get update` before installing go for a cdc test (failed on azure)

A follow up PR will enable an initial set of roachtests to run.

Epic: CC-25185
Release note: none



111776: go.mod: bump Pebble to b013ca78e9dc r=RaduBerinde a=RaduBerinde

b013ca78 db: keep track of virtual sstable size sum
62251e69 db: make checkpoint test even more deterministic
c7c47d6b db: turn testingAlwaysWaitForCleanup into an option
a05b0192 db: keep track of virtual sstable count in metrics
3c778710 db: add test for virtual sstable checkpointing
cb4dab66 db: add metrics for num backing sstables and size
8317cf38 db: incrementally keep tracking of backing table size
0f80e184 Update index.html
aa077af6 go.mod: specify Go 1.20
ccb9a7dc manifest: add invariant check for duplicate file backings
699fc0e8 db: only create one CreatedBackingTables entry per sstable
b2da10c6 db: remove trailing whitespace from compacting log line
1d696c79 db: cleanup btree obsoletion logic

Fixes: #111674
Release note: none

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Miral Gadani <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants