Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: deflake TestRandomClientGeneration #100813

Merged
merged 1 commit into from
Apr 10, 2023

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Apr 6, 2023

This patch fixes 4 bugs in TestRandomClientGeneration that were responsible for the persistent flakiness and lack of coverage in this test:

  • the randomeStreamClient no longer instantiates keys with a table prefix that collides with the job info table prefix. This collision was the original cause of the flakes reported in ccl/streamingccl/streamingest: TestRandomClientGeneration failed #99343.
  • getPartitionSpanToTableId() now generates a correct map from source partition key space to table Id. Previously, the key spans in the map didn't contain keys that mapped to anything logical in the cockroach key space.
  • assertKVs() now checks for keys in the destination tenant keyspace.
  • assertKVs() now actually asserts that kvs were found. Before, the assertion could pass if no keys were actually checked, which has been happening for months and allowed the bugs above to infest this test.

Fixes #99343

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler self-assigned this Apr 6, 2023
@msbutler msbutler force-pushed the butler-c2c-TestClient branch from d9a09f6 to 4f0508a Compare April 6, 2023 16:06
@msbutler msbutler marked this pull request as ready for review April 10, 2023 13:19
@msbutler msbutler requested a review from a team as a code owner April 10, 2023 13:19
@msbutler msbutler requested review from benbardin and adityamaru and removed request for a team April 10, 2023 13:19
@msbutler msbutler added T-disaster-recovery backport-23.1.x Flags PRs that need to be backported to 23.1 labels Apr 10, 2023
}
require.Equal(t, true, foundKVs, "expected to find and assert equal kvs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use require.True

targetSpan := roachpb.Span{Key: startKey, EndKey: startKey.PrefixEnd()}
if assertEqualKVs(t, srv, streamValidator, targetSpan,
maxResolvedTimestampPerPartition[pSpan]) {
foundKVs = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to assert equal kvs for each iteration of this loop? IIUC the way this is structured even if one partition has equal KVs but the others don't the test will pass, cause foundKVs will be true.

Copy link
Collaborator Author

@msbutler msbutler Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good q, under stress-race, it's possible for one partition to produce no checkpoint events. The stream ingestion processor only shuts down after 1000 checkpoint events were recorded, which can all occur from a single partition. I'll add a comment about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but how will we differentiate actual key mismatches vs partitions not producing any checkpoint events? Will something fatal in assertEqualKVs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function returns true if it validated at least one kv. So if there are no checkpoints, it will not validate any kvs and return false. If it surfaces any kvs, they will be validated with assertions. Added a bit more the func docstring to reflect this.

@adityamaru adityamaru self-requested a review April 10, 2023 13:28
@msbutler msbutler force-pushed the butler-c2c-TestClient branch from 4f0508a to 516c9c2 Compare April 10, 2023 13:47
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment about how an unequal KV will bubble up to us but otherwise LGTM!

targetSpan := roachpb.Span{Key: startKey, EndKey: startKey.PrefixEnd()}
if assertEqualKVs(t, srv, streamValidator, targetSpan,
maxResolvedTimestampPerPartition[pSpan]) {
foundKVs = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but how will we differentiate actual key mismatches vs partitions not producing any checkpoint events? Will something fatal in assertEqualKVs?

This patch fixes 4 bugs in TestRandomClientGeneration that were
responsible for the persistent flakiness and lack of coverage in this test:
- the randomeStreamClient no longer instantiates keys with a table prefix that
	collides with the job info table prefix. This collision was the original
  cause of the flakes reported in cockroachdb#99343.
- getPartitionSpanToTableId() now generates a correct map from source partition
	key space to table Id. Previously, the key spans in the map didn't contain
keys that mapped to anything logical in the cockroach key space.
- assertKVs() now checks for keys in the destination tenant keyspace.
- assertKVs() now actually asserts that kvs were found. Before, the assertion
  could pass if no keys were actually checked, which has been happening for
  months and allowed the bugs above to infest this test.

Fixes cockroachdb#99343

Release note: None
@msbutler msbutler force-pushed the butler-c2c-TestClient branch from 516c9c2 to 3539505 Compare April 10, 2023 15:01
@msbutler
Copy link
Collaborator Author

TFTR!

bors r=adityamaru

@craig
Copy link
Contributor

craig bot commented Apr 10, 2023

Build succeeded:

@craig craig bot merged commit 9ce0ac2 into cockroachdb:master Apr 10, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 10, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 3539505 to blathers/backport-release-23.1-100813: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 T-disaster-recovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ccl/streamingccl/streamingest: TestRandomClientGeneration failed
3 participants