storageccl: fix and reenable BenchmarkTimeBoundIterate #96771

RaduBerinde · 2023-02-08T04:33:18Z

This benchmark creates a mvcc_data directory in the current working dir (i.e. pkg/ccl/storageccl/engineccl) and reuses it if exists. Normally this would not be permitted in CI; but mvcc_data is in .gitignore so it doesn't trip the "assert workspace clean" CI step.

The leftover mvcc_data can be from an older version, which can cause mysterious CI failures.

This fix changes to generating the data in a temporary directory and cleans it up afterwards. The generation only takes a few seconds (significantly less than what it takes to run all benchmarks).

Release note: None
Epic: none

cockroach-teamcity · 2023-02-08T04:33:27Z

This change is

This benchmark creates a `mvcc_data` directory in the current working dir (i.e. `pkg/ccl/storageccl/engineccl`) and reuses it if exists. Normally this would not be permitted in CI; but `mvcc_data` is in `.gitignore` so it doesn't trip the "assert workspace clean" CI step. The leftover `mvcc_data` can be from an older version, which can cause mysterious CI failures. This fix changes to generating the data in a temporary directory and cleans it up afterwards. The generation only takes a few seconds (significantly less than what it takes to run all benchmarks). Release note: None Epic: none

jbowens

I think we really need some method of caching artifacts between benchmarks. The generated data isn't deterministic, so it can increase the variance of the benchmark to the point that it's difficult to get reliable measurements. This particular benchmark's data is small, but we have others in pkg/storage that may take minutes to construct the initial state.

I'm a little surprised that this is happening now, because under bazel anything written within the sandbox is lost when the sandbox is torn down. We've been running our pkg/storage benchmarks using make bench despite it being deprecated because of it (#83599).

This change LGTM. I think the same problem applies to pkg/storage's MVCC benchmarks though, and I don't think we want to apply the same treatment there.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick)

RaduBerinde · 2023-02-08T15:18:25Z

under bazel anything written within the sandbox is lost when the sandbox is torn down.

I am not sure but the failures might have been from the non-bazel CI.

I could add an env variable that you can use to point it to a less temporary location, would that help?

RaduBerinde · 2023-02-08T15:19:26Z

An alternative would be to change the behavior only under CI.. is there a way we can tell from the code? CC @rickystewart

jbowens · 2023-02-08T15:30:21Z

An alternative would be to change the behavior only under CI.

Oh, good idea. I think we could change the behavior if !bazel.BuiltWithBazel()

RaduBerinde · 2023-02-08T16:36:29Z

I think we could change the behavior if !bazel.BuiltWithBazel()

Isn't BuiltWithBazel true if you use bazel/dev to run the benchmark (which will be the only way going forward)?

RaduBerinde · 2023-02-08T16:40:25Z

The generated data isn't deterministic

By the way, the data is generated using a hardcoded random seed. Are there other sources of non-determinism that cause the resulting store to be materially different? (eg timing of operations or IOs)?

jbowens · 2023-02-08T17:15:28Z

Isn't BuiltWithBazel true if you use bazel/dev to run the benchmark (which will be the only way going forward)?

Yeah, but we need some kind of tooling support to make it viable for these benchmarks. Bazel tests today can't write anything outside of the sandbox, so there's no way to generate a fixture and have it accessible by the next run. In #83599 we
were hoping it would be possible to add some tooling to pull the fixtures out of the bazel sandbox somehow.

By the way, the data is generated using a hardcoded random seed. Are there other sources of non-determinism that cause the resulting store to be materially different? (eg timing of operations or IOs)?

Yeah, we allow up to 3 concurrent compactions, 1 flush and 1 table stats collector goroutine. The timing of these 5 routines affect the resulting LSM and what physical keys exist.

RaduBerinde · 2023-02-08T17:29:51Z

Yeah, but we need some kind of tooling support to make it viable for these benchmarks. Bazel tests today can't write anything outside of the sandbox, so there's no way to generate a fixture and have it accessible by the next run. In #83599 we were hoping it would be possible to add some tooling to pull the fixtures out of the bazel sandbox somehow.

Hm, I see. One way would be to check the fixtures into a special repository and pull them in as a dependency to the benchmark.

RaduBerinde · 2023-02-08T22:14:02Z

Actually, would it be so bad if we checked in mvcc_data? It's only a few megabytes.. Not sure what the policy is for such fixtures in the git repo.

jbowens · 2023-02-08T22:45:49Z

Actually, would it be so bad if we checked in mvcc_data? It's only a few megabytes.. Not sure what the policy is for such fixtures in the git repo.

Seems okay to me, but I bet @rickystewart or others on @cockroach-dev-inf would have an opinion.

RaduBerinde · 2023-02-13T18:06:40Z

I'm finding more benchmarks that apply the same technique, and which occasionally run into fixtures from previous versions on CI: BenchmarkRefreshRange

RaduBerinde · 2023-02-13T19:17:59Z

Closing this for now; filed issue #97061 to track.

RaduBerinde requested a review from a team as a code owner February 8, 2023 04:33

RaduBerinde requested a review from bananabrick February 8, 2023 04:33

RaduBerinde force-pushed the fix-iterate-bench branch from 36c6798 to e68c5d6 Compare February 8, 2023 04:42

jbowens approved these changes Feb 8, 2023

View reviewed changes

RaduBerinde closed this Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storageccl: fix and reenable BenchmarkTimeBoundIterate #96771

storageccl: fix and reenable BenchmarkTimeBoundIterate #96771

RaduBerinde commented Feb 8, 2023

cockroach-teamcity commented Feb 8, 2023

jbowens left a comment

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 13, 2023

RaduBerinde commented Feb 13, 2023

storageccl: fix and reenable BenchmarkTimeBoundIterate #96771

storageccl: fix and reenable BenchmarkTimeBoundIterate #96771

Conversation

RaduBerinde commented Feb 8, 2023

cockroach-teamcity commented Feb 8, 2023

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

RaduBerinde commented Feb 8, 2023

jbowens commented Feb 8, 2023

RaduBerinde commented Feb 13, 2023

RaduBerinde commented Feb 13, 2023