Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload/tpcc: add multiregion TPC-C setup #65600

Merged
merged 2 commits into from
Jun 7, 2021

Conversation

otan
Copy link
Contributor

@otan otan commented May 23, 2021

Introduce a tpcc workload for multi-region, which can be triggered by
specifying --regions comma-separated-list-of-regions. I've preserved
the existing TPC-C options in case we want to test legacy
configurations.

The schema creation was changed to something using the options pattern
which makes it a lot cleaner to specify the schema.

Release note: None

@otan otan requested review from nvanbenschoten and a team May 23, 2021 23:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan force-pushed the tpcc_multiregion branch 5 times, most recently from 4837b43 to c972190 Compare May 26, 2021 22:41
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Could have sworn I wrapped this up on Friday 😦

Reviewed 3 of 3 files at r1, 3 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @otan)


pkg/cmd/roachtest/tpcc.go, line 352 at r2 (raw file):

	})

	for _, survivalGoal := range []string{"az", "region"} {

Nit: az->zone?


pkg/cmd/roachtest/tpcc.go, line 365 at r2 (raw file):

			Owner:      OwnerMultiRegion,
			MinVersion: "v21.1.0",
			Cluster:    makeClusterSpec(10, geo(), zones(strings.Join(zs, ","))),

Why 10 nodes? Is the extra one for the client? Could you add a comment explaining the rationale here?


pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):

				// on each node of a cluster, instead of one node using a workload on all clusters.
				runTPCC(ctx, t, c, tpccOptions{
					Warehouses:     9,

Nit: Make this a function of the number of nodes?


pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):

}

func maybeAddLocalityRegionalByRow(

Might be worth adding a comment here to describe what we're doing. It took me a few minutes to realize that %% meant that we were modding the warehouse number by the number of regions. We also might want to note here that if the number of warehouses is ever increased after the partitioning is applied, this will break (or at least, that's my understanding).


pkg/workload/tpcc/partition.go, line 122 at r1 (raw file):

const (
	survivalGoalAZ survivalGoal = iota

Nit: survivalGoalZone? To match the terminology we use in SHOW DATABASES?


pkg/workload/tpcc/partition.go, line 130 at r1 (raw file):

	switch s {
	case survivalGoalAZ:
		return "az"

Nit: zone?


pkg/workload/tpcc/tpcc.go, line 204 at r1 (raw file):

			`is that the URLs are distributed evenly over the partitions`)
		g.flags.Var(&g.zoneCfg.strategy, `partition-strategy`, `Partition tables according to which strategy [replication, leases]`)
		g.flags.StringSliceVar(&g.zoneCfg.zones, "zones", []string{}, "Zones for partitioning, the number of zones should match the number of partitions and the zones used to start cockroach.")

Should we be removing zones? Is there any reason why someone would want to use zones now that we have regions?


pkg/workload/tpcc/tpcc.go, line 277 at r1 (raw file):

				if len(w.multiRegionCfg.regions) > 0 && (len(w.multiRegionCfg.regions) != w.partitions) {
					return errors.Errorf(`--regions should have the same length as --partitions.`)

Can we infer partitions in the case where regions is specified?


pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):

				var stmt string
				if i == 0 {
					stmt = fmt.Sprintf(`alter database %s set primary region %q`, dbName, region)

Seems like we're implying here the first specified is the primary region. We should either document that, or, what might make the logic here simpler is if you also add a --primary-region flag.


pkg/workload/tpcc/tpcc.go, line 387 at r1 (raw file):

				panic("unexpected")
			}
			stmts = append(stmts, fmt.Sprintf(`alter database %s survive %s`, dbName, survivalGoal))

Nit: we should only need to do this for region, right? Or is it possible that the database already exists at the time of this call?

Also, do we need to check that there are 3 regions before setting region survivability?


pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):

					}
				}
				// Set GLOBAL only after data is loaded to speed up initialization time.

Can you remind me again why this is needed? Is it because we're inserting here, instead of IMPORTing? Does it make sense to call out that rationale here? Or better yet, change this to use IMPORT?


pkg/workload/tpcc/tpcc.go, line 445 at r1 (raw file):

			}

			if len(w.multiRegionCfg.regions) != 0 {

This is to preserve the old behaviour too, if regions aren't specified? If so, might be worth including a comment here, for future readers.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @nvanbenschoten)


pkg/cmd/roachtest/tpcc.go, line 365 at r2 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Why 10 nodes? Is the extra one for the client? Could you add a comment explaining the rationale here?

Yeah, it's standard through all the roachtest setups here.


pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Make this a function of the number of nodes?

i think we can make this arbitrarily large/small and not based on the number of nodes, so i'm hesitant to make it a function for now.


pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):

modding the warehouse number by the number of regions

modding the ID (primary key) by the number of partitions.

We also might want to note here that if the number of warehouses is ever increased after the partitioning is applied, this will break (or at least, that's my understanding).

where did you get that impression?


pkg/workload/tpcc/tpcc.go, line 204 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Should we be removing zones? Is there any reason why someone would want to use zones now that we have regions?

Seems like some legacy tests still use it, I'm not up for gutting everything atm.


pkg/workload/tpcc/tpcc.go, line 277 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Can we infer partitions in the case where regions is specified?

I'm following the same code as for --zones, so i'm going to let this be for now.


pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Seems like we're implying here the first specified is the primary region. We should either document that, or, what might make the logic here simpler is if you also add a --primary-region flag.

I'll do documenting for now.


pkg/workload/tpcc/tpcc.go, line 387 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: we should only need to do this for region, right? Or is it possible that the database already exists at the time of this call?

Also, do we need to check that there are 3 regions before setting region survivability?

I think it's safe for this to do "idempotency" and do it regardless.

Added validation for 3 regions, but the database checks for us anyway.


pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Can you remind me again why this is needed? Is it because we're inserting here, instead of IMPORTing? Does it make sense to call out that rationale here? Or better yet, change this to use IMPORT?

I thought IMPORT didn't work.

Regardless, INSERT is slow so yes we defer it. Commented extra.


pkg/workload/tpcc/tpcc.go, line 445 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

This is to preserve the old behaviour too, if regions aren't specified? If so, might be worth including a comment here, for future readers.

Done.

@otan otan force-pushed the tpcc_multiregion branch from c972190 to 467c077 Compare June 1, 2021 00:11
@otan otan mentioned this pull request Jun 1, 2021
3 tasks
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @otan)


pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

i think we can make this arbitrarily large/small and not based on the number of nodes, so i'm hesitant to make it a function for now.

Does it warrant a comment in some way? I'd think that we'd need to make this a multiple of the number of nodes or else the warehouse count will be unbalanced.


pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):

where did you get that impression?

Sorry, I was unclear. If we add more regions to the database (which presumably would imply more warehouses as well, but isn't strictly required), won't the static hashing of partition number to region break? That's what I was suggesting we comment.

Or maybe I'm still confused.


pkg/workload/tpcc/tpcc.go, line 367 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

I'll do documenting for now.

Looks good.


pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

I thought IMPORT didn't work.

Regardless, INSERT is slow so yes we defer it. Commented extra.

IMPORT works for GLOBAL and REGIONAL BY TABLE tables, just not for RBR.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @nvanbenschoten)


pkg/cmd/roachtest/tpcc.go, line 377 at r2 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Does it warrant a comment in some way? I'd think that we'd need to make this a multiple of the number of nodes or else the warehouse count will be unbalanced.

hmm, you have convinced me.


pkg/workload/tpcc/ddls.go, line 233 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

where did you get that impression?

Sorry, I was unclear. If we add more regions to the database (which presumably would imply more warehouses as well, but isn't strictly required), won't the static hashing of partition number to region break? That's what I was suggesting we comment.

Or maybe I'm still confused.

ah yes, you are correct. i'll add that comment.


pkg/workload/tpcc/tpcc.go, line 437 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

IMPORT works for GLOBAL and REGIONAL BY TABLE tables, just not for RBR.

ah, well the roachtest currently works with init we're going with this
for IMPORT, this should be idempotent anyway.

otan added 2 commits June 2, 2021 06:51
Introduce a tpcc workload for multi-region, which can be triggered by
specifying `--regions comma-separated-list-of-regions`. I've preserved
the existing TPC-C options in case we want to test legacy
configurations.

The schema creation was changed to something using the options pattern
which makes it a lot cleaner to specify the schema.

Release note: None
Added a TPCC run that turns random nodes in a cluster on and off for 5
minute periods whilst running a multi-region workload.

Release note: None
@otan otan force-pushed the tpcc_multiregion branch from 467c077 to 068a805 Compare June 1, 2021 20:51
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. :lgtm:

Reviewed 3 of 3 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=ajstorm

@craig
Copy link
Contributor

craig bot commented Jun 7, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants