-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/tpcc: add multiregion TPC-C setup #65600
Conversation
4837b43
to
c972190
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Could have sworn I wrapped this up on Friday 😦
Reviewed 3 of 3 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @otan)
pkg/cmd/roachtest/tpcc.go, line 352 at r2 (raw file):
}) for _, survivalGoal := range []string{"az", "region"} {
Nit: az->zone?
pkg/cmd/roachtest/tpcc.go, line 365 at r2 (raw file):
Owner: OwnerMultiRegion, MinVersion: "v21.1.0", Cluster: makeClusterSpec(10, geo(), zones(strings.Join(zs, ","))),
Why 10 nodes? Is the extra one for the client? Could you add a comment explaining the rationale here?
pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):
// on each node of a cluster, instead of one node using a workload on all clusters. runTPCC(ctx, t, c, tpccOptions{ Warehouses: 9,
Nit: Make this a function of the number of nodes?
pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):
} func maybeAddLocalityRegionalByRow(
Might be worth adding a comment here to describe what we're doing. It took me a few minutes to realize that %%
meant that we were modding the warehouse number by the number of regions. We also might want to note here that if the number of warehouses is ever increased after the partitioning is applied, this will break (or at least, that's my understanding).
pkg/workload/tpcc/partition.go, line 122 at r1 (raw file):
const ( survivalGoalAZ survivalGoal = iota
Nit: survivalGoalZone
? To match the terminology we use in SHOW DATABASES?
pkg/workload/tpcc/partition.go, line 130 at r1 (raw file):
switch s { case survivalGoalAZ: return "az"
Nit: zone?
pkg/workload/tpcc/tpcc.go, line 204 at r1 (raw file):
`is that the URLs are distributed evenly over the partitions`) g.flags.Var(&g.zoneCfg.strategy, `partition-strategy`, `Partition tables according to which strategy [replication, leases]`) g.flags.StringSliceVar(&g.zoneCfg.zones, "zones", []string{}, "Zones for partitioning, the number of zones should match the number of partitions and the zones used to start cockroach.")
Should we be removing zones? Is there any reason why someone would want to use zones now that we have regions?
pkg/workload/tpcc/tpcc.go, line 277 at r1 (raw file):
if len(w.multiRegionCfg.regions) > 0 && (len(w.multiRegionCfg.regions) != w.partitions) { return errors.Errorf(`--regions should have the same length as --partitions.`)
Can we infer partitions in the case where regions is specified?
pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):
var stmt string if i == 0 { stmt = fmt.Sprintf(`alter database %s set primary region %q`, dbName, region)
Seems like we're implying here the first specified is the primary region. We should either document that, or, what might make the logic here simpler is if you also add a --primary-region flag.
pkg/workload/tpcc/tpcc.go, line 387 at r1 (raw file):
panic("unexpected") } stmts = append(stmts, fmt.Sprintf(`alter database %s survive %s`, dbName, survivalGoal))
Nit: we should only need to do this for region, right? Or is it possible that the database already exists at the time of this call?
Also, do we need to check that there are 3 regions before setting region survivability?
pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):
} } // Set GLOBAL only after data is loaded to speed up initialization time.
Can you remind me again why this is needed? Is it because we're inserting here, instead of IMPORTing? Does it make sense to call out that rationale here? Or better yet, change this to use IMPORT?
pkg/workload/tpcc/tpcc.go, line 445 at r1 (raw file):
} if len(w.multiRegionCfg.regions) != 0 {
This is to preserve the old behaviour too, if regions aren't specified? If so, might be worth including a comment here, for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @nvanbenschoten)
pkg/cmd/roachtest/tpcc.go, line 365 at r2 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Why 10 nodes? Is the extra one for the client? Could you add a comment explaining the rationale here?
Yeah, it's standard through all the roachtest setups here.
pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Make this a function of the number of nodes?
i think we can make this arbitrarily large/small and not based on the number of nodes, so i'm hesitant to make it a function for now.
pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):
modding the warehouse number by the number of regions
modding the ID (primary key) by the number of partitions.
We also might want to note here that if the number of warehouses is ever increased after the partitioning is applied, this will break (or at least, that's my understanding).
where did you get that impression?
pkg/workload/tpcc/tpcc.go, line 204 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Should we be removing zones? Is there any reason why someone would want to use zones now that we have regions?
Seems like some legacy tests still use it, I'm not up for gutting everything atm.
pkg/workload/tpcc/tpcc.go, line 277 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Can we infer partitions in the case where regions is specified?
I'm following the same code as for --zones, so i'm going to let this be for now.
pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Seems like we're implying here the first specified is the primary region. We should either document that, or, what might make the logic here simpler is if you also add a --primary-region flag.
I'll do documenting for now.
pkg/workload/tpcc/tpcc.go, line 387 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: we should only need to do this for region, right? Or is it possible that the database already exists at the time of this call?
Also, do we need to check that there are 3 regions before setting region survivability?
I think it's safe for this to do "idempotency" and do it regardless.
Added validation for 3 regions, but the database checks for us anyway.
pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Can you remind me again why this is needed? Is it because we're inserting here, instead of IMPORTing? Does it make sense to call out that rationale here? Or better yet, change this to use IMPORT?
I thought IMPORT didn't work.
Regardless, INSERT is slow so yes we defer it. Commented extra.
pkg/workload/tpcc/tpcc.go, line 445 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
This is to preserve the old behaviour too, if regions aren't specified? If so, might be worth including a comment here, for future readers.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @otan)
pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
i think we can make this arbitrarily large/small and not based on the number of nodes, so i'm hesitant to make it a function for now.
Does it warrant a comment in some way? I'd think that we'd need to make this a multiple of the number of nodes or else the warehouse count will be unbalanced.
pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):
where did you get that impression?
Sorry, I was unclear. If we add more regions to the database (which presumably would imply more warehouses as well, but isn't strictly required), won't the static hashing of partition number to region break? That's what I was suggesting we comment.
Or maybe I'm still confused.
pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
I'll do documenting for now.
Looks good.
pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
I thought IMPORT didn't work.
Regardless, INSERT is slow so yes we defer it. Commented extra.
IMPORT works for GLOBAL and REGIONAL BY TABLE tables, just not for RBR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @nvanbenschoten)
pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Does it warrant a comment in some way? I'd think that we'd need to make this a multiple of the number of nodes or else the warehouse count will be unbalanced.
hmm, you have convinced me.
pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
where did you get that impression?
Sorry, I was unclear. If we add more regions to the database (which presumably would imply more warehouses as well, but isn't strictly required), won't the static hashing of partition number to region break? That's what I was suggesting we comment.
Or maybe I'm still confused.
ah yes, you are correct. i'll add that comment.
pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
IMPORT works for GLOBAL and REGIONAL BY TABLE tables, just not for RBR.
ah, well the roachtest currently works with init we're going with this
for IMPORT, this should be idempotent anyway.
Introduce a tpcc workload for multi-region, which can be triggered by specifying `--regions comma-separated-list-of-regions`. I've preserved the existing TPC-C options in case we want to test legacy configurations. The schema creation was changed to something using the options pattern which makes it a lot cleaner to specify the schema. Release note: None
Added a TPCC run that turns random nodes in a cluster on and off for 5 minute periods whilst running a multi-region workload. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes.
Reviewed 3 of 3 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r=ajstorm
Build succeeded: |
Introduce a tpcc workload for multi-region, which can be triggered by
specifying
--regions comma-separated-list-of-regions
. I've preservedthe existing TPC-C options in case we want to test legacy
configurations.
The schema creation was changed to something using the options pattern
which makes it a lot cleaner to specify the schema.
Release note: None