Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: attempt to shard huge test targets more #98834

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

healthy-pod
Copy link
Contributor

@healthy-pod healthy-pod commented Mar 17, 2023

This code change attempts to shard some test
targets that could benefit from more sharding. It
also splits unit tests in TeamCity into two runs:
ccl unit tests, and non-ccl unit tests. The current
build config will be used to run non-ccl unit tests.
The new build config will be used for ccl tests (those
under pkg/ccl). This cuts unit tests wall time by
half while keeping machine time almost the same.

Release note: None
Epic: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@healthy-pod healthy-pod force-pushed the shard-tests branch 7 times, most recently from 7e0de19 to aa5994c Compare March 18, 2023 20:37
@blathers-crl
Copy link

blathers-crl bot commented Mar 18, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@healthy-pod healthy-pod force-pushed the shard-tests branch 11 times, most recently from 0da47f0 to b311d92 Compare March 20, 2023 19:26
@@ -3376,45 +3376,59 @@ test_suite(
)

test_suite(
name = "small_tests",
name = "ccl_tests",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not maintain the small, medium, large, enormous categories for CCL tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Because we don't use them, we can easily add them if we need to though.

@@ -12,8 +12,11 @@ go_test(
"//pkg/sql/logictest:testdata", # keep
"//pkg/sql/opt/exec/execbuilder:testdata", # keep
],
shard_count = 16,
tags = ["cpu:2"],
shard_count = 48,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  • How does shard count interact with cpu:2 (or cpu:1 in other cases)?
  • Why change some to 48 but not others?

Copy link
Contributor Author

@healthy-pod healthy-pod Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change some to 48 but not others?

We upload bazel trace profile to artifacts on each unit tests run. I looked at some runs, found targets that had large shards or bottle-necking the process, and sharded them more. Mainly, logictests, backupccl, schemachangerccl, kvserver. If a logictests package has less than 48 tests, it gets n number of shards where n is the number of tests.

How does shard count interact with cpu:2 (or cpu:1 in other cases)?

Good question. Looking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does shard count interact with cpu:2 (or cpu:1 in other cases)?

Each shard gets n cores if cpu:n is used. I tried to play with it but didn't notice any changes so preferred to keep it unchanged.

@healthy-pod healthy-pod force-pushed the shard-tests branch 5 times, most recently from c5b816a to 8d35f3d Compare March 21, 2023 21:24
Copy link
Collaborator

@rickystewart rickystewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

tags = [
"-broken_in_bazel",
"-flaky",
"-integration",
"%[1]s",
"-ccl_test"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Keep the list alphabetized? (Put -ccl_test after -broken_in_bazel)

@benbardin
Copy link
Collaborator

LGTM for backupccl

This code change attempts to shard some test
targets that could benefit from more sharding. It
also splits unit tests in TeamCity into two runs:
ccl unit tests, and non-ccl unit tests. The current
build config will be used to run non-ccl unit tests.
The new build config will be used for ccl tests (those
under `pkg/ccl`). This cuts unit tests wall time by
half while keeping machine time almost the same.

Release note: None
Epic: none
@healthy-pod
Copy link
Contributor Author

TFTRs!

bors r=rickystewart

@craig
Copy link
Contributor

craig bot commented Mar 23, 2023

Build succeeded:

@craig craig bot merged commit 384c55d into cockroachdb:master Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants