Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: jitter the teardown of connections to prevent thundering herd #100533

Merged
merged 1 commit into from
Apr 10, 2023

Conversation

sean-
Copy link
Collaborator

@sean- sean- commented Apr 3, 2023

This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

  • the max connection age: --max-conn-lifetime duration
  • the max connection age jitter: --max-conn-lifetime-jitter duration
  • the max connection idle time: --max-conn-idle-time duration
  • the connection health check interval: --conn-healthcheck-period duration
  • the min number of connections in the pool: --min-conns int

workload: add support for remaining pgx query modes

Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See --conn-healthcheck-period, --min-conns, and the --max-conn-* flags for details.

Release note (cli change): workload now supports every PostgreSQL query mode available via the underlying pgx driver.

@sean- sean- self-assigned this Apr 3, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworking #98689 plus fixes for broken nightlies in this PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sean-)


pkg/workload/tpcc/tpcc.go line 174 at r1 (raw file):

			`idle-conns`:               {RuntimeOnly: true},
			`local-warehouses`:         {RuntimeOnly: true},
			`max-conn-idle-time`:       {RuntimeOnly: true},

Registering meta flags was the required fix for #100018, FYI.

@sean- sean- force-pushed the cmd-workload-jitter-conns branch 3 times, most recently from e01106d to fb7a0cc Compare April 7, 2023 20:52
@sean- sean- marked this pull request as ready for review April 7, 2023 21:01
@sean- sean- requested review from a team as code owners April 7, 2023 21:01
@sean- sean- requested review from srosenberg, renatolabs and nvanbenschoten and removed request for a team April 7, 2023 21:01
@srosenberg
Copy link
Member

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, and @srosenberg)


pkg/workload/pgx_helpers.go line 332 at r2 (raw file):

	warmupConns := make(chan *pgxpool.Conn, numWarmupConns)
	connsPerURL := distribute(numWarmupConns, len(m.Pools))

With the latest fix, this logic feels like it's taking an extra step by:

  1. computing numWarmupConns from p.MaxConns
  2. distributing numWarmupConns across pools
  3. allocating up to min of (p.MaxConns, distribute(numWarmupConns))

Can we invert the logic and skip a step while also making this logic more understandable. I'm thinking something like:

var warmupConnsPerPool []int
if numConns == 0 {
	warmupConnsPerPool = make([]int, len(m.Pools))
	for i, p := range m.Pools {
		warmupConnsPerPool[i] = int(p.Config().MaxConns)
	}
} else {
        // TODO: what if numConns is > p.Config().MaxConns?
	warmupConnsPerPool = distribute(numConns, len(m.Pools)
}

var numWarmupConns int
for _, n := range warmupConnsPerPool {
	numWarmupConns += n
}
warmupConns := make(chan *pgxpool.Conn, numWarmupConns)

for i, p := range m.Pools {
	p := p
	for k := 0; k < warmupConnsPerPool[i]; k++ {
		g.Go(func() error {
			conn, err := p.Acquire(warmupCtx)
			if err != nil {
				return err
			}
			warmupConns <- conn
			return nil
		})
	}
}

This change upgrades workload's use of pgx from v4 to v5 in order to allow
jittering the teardown of connections.  This change sets a max connection age
of 5min and jitters the teardown by 30s.  Upgrading to pgx v5 also adds
non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

* the max connection age: `--max-conn-lifetime duration`
* the max connection age jitter: `--max-conn-lifetime-jitter duration`
* the max connection idle time: `--max-conn-idle-time duration`
* the connection health check interval: `--conn-healthcheck-period duration`
* the min number of connections in the pool: `--min-conns int`

workload: add support for remaining pgx query modes

Add support for pgx.QueryExecModeCacheDescribe and
pgx.QueryExecModeDescribeExec.  Previously, only three of the five query modes
were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent
thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the
connection pool used for testing.  See `--conn-healthcheck-period`,
`--min-conns`, and the `--max-conn-*` flags for details.

Release note (cli change): workload now supports every [PostgreSQL query
mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182)
available via the underlying pgx driver.
@sean- sean- force-pushed the cmd-workload-jitter-conns branch from fb7a0cc to bdf3f62 Compare April 10, 2023 20:28
Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, and @srosenberg)


pkg/workload/pgx_helpers.go line 332 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

With the latest fix, this logic feels like it's taking an extra step by:

  1. computing numWarmupConns from p.MaxConns
  2. distributing numWarmupConns across pools
  3. allocating up to min of (p.MaxConns, distribute(numWarmupConns))

Can we invert the logic and skip a step while also making this logic more understandable. I'm thinking something like:

var warmupConnsPerPool []int
if numConns == 0 {
	warmupConnsPerPool = make([]int, len(m.Pools))
	for i, p := range m.Pools {
		warmupConnsPerPool[i] = int(p.Config().MaxConns)
	}
} else {
        // TODO: what if numConns is > p.Config().MaxConns?
	warmupConnsPerPool = distribute(numConns, len(m.Pools)
}

var numWarmupConns int
for _, n := range warmupConnsPerPool {
	numWarmupConns += n
}
warmupConns := make(chan *pgxpool.Conn, numWarmupConns)

for i, p := range m.Pools {
	p := p
	for k := 0; k < warmupConnsPerPool[i]; k++ {
		g.Go(func() error {
			conn, err := p.Acquire(warmupCtx)
			if err != nil {
				return err
			}
			warmupConns <- conn
			return nil
		})
	}
}

Good suggestion. Updated.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)

@sean-
Copy link
Collaborator Author

sean- commented Apr 10, 2023

bors r+

@craig craig bot merged commit 668ec8d into master Apr 10, 2023
@craig craig bot deleted the cmd-workload-jitter-conns branch April 10, 2023 21:28
@craig
Copy link
Contributor

craig bot commented Apr 10, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants