Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: lease acquisition can repeatedly contend with DDL statement when closed_timestamp target_duration and side_transport_interval are very low #89900

Open
stevendanna opened this issue Oct 13, 2022 · 7 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@stevendanna
Copy link
Collaborator

stevendanna commented Oct 13, 2022

Describe the problem

A transaction with a DDL statement can contend with a lease-acquisition transaction when kv.closed_timestamp.target_duration and kv.closed_timestamp.side_transport_interval are set very low.

The following test, when run under stress, will typically hit a timeout if you set the per-test timeout to 3-5 minutes:

func TestMultipleGrantsLeaseStall(t *testing.T) {
	params := base.TestClusterArgs{}
	params.ServerArgs.Knobs = base.TestingKnobs{
		JobsTestingKnobs: jobs.NewTestingKnobsWithShortIntervals(),
	}
	settings := cluster.MakeTestingClusterSettings()
	closedts.TargetDuration.Override(context.Background(), &settings.SV, 10*time.Millisecond)
	closedts.SideTransportCloseInterval.Override(context.Background(), &settings.SV, 10*time.Millisecond)
	params.ServerArgs.Settings = settings
	tc := testcluster.StartTestCluster(t, 1, params)
	conn := tc.ServerConn(0)
	_, err := conn.Exec(`
CREATE USER user1;
CREATE USER testuser;
`)
	require.NoError(t, err)
	_, err = conn.Exec(`
CREATE DATABASE testdb;
USE testdb;
CREATE TYPE testdb.greeting_usage AS ENUM ('howdy');
CREATE TABLE testdb.testtable_simple (a int);
CREATE TABLE testdb.testtable_greeting_usage (a greeting_usage);
CREATE SCHEMA sc;
CREATE TABLE testdb.sc.othertable (a INT);
`)
	require.NoError(t, err)

	_, err = conn.Exec(`
GRANT ALL ON DATABASE testdb TO user1 WITH GRANT OPTION;
GRANT ALL ON SCHEMA public TO user1 WITH GRANT OPTION;
GRANT USAGE ON SCHEMA sc TO user1;
GRANT SELECT ON testdb.sc.othertable TO user1;
`)
	require.NoError(t, err)
}

When running:

GRANT ALL ON DATABASE testdb TO user1 WITH GRANT OPTION;
GRANT ALL ON SCHEMA public TO user1 WITH GRANT OPTION;
GRANT USAGE ON SCHEMA sc TO user1;
GRANT SELECT ON testdb.sc.othertable TO user1;

The first statement writes to the descriptor for testdb. The last statement results in a lease acquisition attempt on testdb. That least acquisition attempt runs in another txn that pushes the txn containing the DDL statements. But, it seems that it never pushes it far enough.

Jira issue: CRDB-20491

@stevendanna stevendanna added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-schema-deprecated Use T-sql-foundations instead labels Oct 13, 2022
stevendanna added a commit to stevendanna/cockroach that referenced this issue Oct 13, 2022
This test was skipped because it timed out.

I believe cockroachdb#89900 is the likely cause of the timeout. Since this test
doesn't depend on the shorter closed timestamp setting, we can reset
them to make the timeout much less likely.

Fixes cockroachdb#87129

Release note: None
craig bot pushed a commit that referenced this issue Oct 17, 2022
89901: backupccl: unskip restore-grants r=adityamaru a=stevendanna

This test was skipped because it timed out.

I believe #89900 is the likely cause of the timeout. Since this test doesn't depend on the shorter closed timestamp setting, we can reset them to make the timeout much less likely.

Fixes #87129

Release note: None

Co-authored-by: Steven Danna <[email protected]>
@ajwerner
Copy link
Contributor

Hold on, the bug here is that the last statement acquires a lease on testdb. That shouldn't happen if we've already modified testdb in the current transaction. That's a severe bug.

@postamar postamar self-assigned this Oct 18, 2022
@stevendanna
Copy link
Collaborator Author

stevendanna commented Oct 21, 2022

@ajwerner Should this be a GA blocker then? I haven't tested it on other branches.

@postamar
Copy link
Contributor

I'm unable to repro this. Putting this back in triage because I don't have time to look into this further right now.

@postamar postamar removed their assignment Oct 21, 2022
@stevendanna
Copy link
Collaborator Author

stevendanna commented Oct 21, 2022

@postamar I just tried this again, and for reasons that aren't clear to me, this is nearly impossible to reproduce for me on my macOS. But on a gceworker I can reproduce it with:

./dev test ./pkg/sql --filter TestMultipleGrantsLeaseStall --show-logs --verbose --stress --test-args='-test.timeout 3m' --stress-args '-p 3' 2>&1 | tee stresslog5.out

Note that the -test.timeout 3m is needed because I think often the condition clears eventually.

Anyway, I trust y'alls judgement on whether this is worth looking into, just mentioned this since I found it odd that I wasn't able to reproduce it for a bit either.

@ajwerner
Copy link
Contributor

ajwerner commented Nov 2, 2022

I wonder if this is caused by #91116

@stevendanna
Copy link
Collaborator Author

@ajwerner Well, it appears to be fixed on master, but git-bisect seems to be pointing at 1c11f2c as the place it was fixed which doesn't yet make a whole lot of sense to me.

@ajwerner
Copy link
Contributor

That definitely surprises me. I'd expect it to be #91116.

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
stevendanna added a commit to stevendanna/cockroach that referenced this issue Jul 4, 2023
This test passed under stress for 5 hours, which seems sufficient to
unskip it.

The underlying issue described in cockroachdb#89900 appears to be much harder to
reproduce and may now be resolved.

Informs cockroachdb#89900
Fixes cockroachdb#90444

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Jul 6, 2023
106115: backupccl: enable restore-grants test on tenant r=rhu713 a=stevendanna

This test passed under stress for 5 hours, which seems sufficient to unskip it.

The underlying issue described in #89900 appears to be much harder to reproduce and may now be resolved.

Informs #89900
Fixes #90444

Epic: none

Release note: None

Co-authored-by: Steven Danna <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jul 6, 2023
This test passed under stress for 5 hours, which seems sufficient to
unskip it.

The underlying issue described in #89900 appears to be much harder to
reproduce and may now be resolved.

Informs #89900
Fixes #90444

Epic: none

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

3 participants