-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: jitter the teardown of connections to prevent thundering herd #100533
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworking #98689 plus fixes for broken nightlies in this PR.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sean-)
pkg/workload/tpcc/tpcc.go
line 174 at r1 (raw file):
`idle-conns`: {RuntimeOnly: true}, `local-warehouses`: {RuntimeOnly: true}, `max-conn-idle-time`: {RuntimeOnly: true},
Registering meta flags was the required fix for #100018, FYI.
e01106d
to
fb7a0cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, and @srosenberg)
pkg/workload/pgx_helpers.go
line 332 at r2 (raw file):
warmupConns := make(chan *pgxpool.Conn, numWarmupConns) connsPerURL := distribute(numWarmupConns, len(m.Pools))
With the latest fix, this logic feels like it's taking an extra step by:
- computing
numWarmupConns
from p.MaxConns - distributing numWarmupConns across pools
- allocating up to min of (p.MaxConns, distribute(numWarmupConns))
Can we invert the logic and skip a step while also making this logic more understandable. I'm thinking something like:
var warmupConnsPerPool []int
if numConns == 0 {
warmupConnsPerPool = make([]int, len(m.Pools))
for i, p := range m.Pools {
warmupConnsPerPool[i] = int(p.Config().MaxConns)
}
} else {
// TODO: what if numConns is > p.Config().MaxConns?
warmupConnsPerPool = distribute(numConns, len(m.Pools)
}
var numWarmupConns int
for _, n := range warmupConnsPerPool {
numWarmupConns += n
}
warmupConns := make(chan *pgxpool.Conn, numWarmupConns)
for i, p := range m.Pools {
p := p
for k := 0; k < warmupConnsPerPool[i]; k++ {
g.Go(func() error {
conn, err := p.Acquire(warmupCtx)
if err != nil {
return err
}
warmupConns <- conn
return nil
})
}
}
This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition. workload: add flags to manage the age and lifecycle of connection pool Add flags to all workload types to specify: * the max connection age: `--max-conn-lifetime duration` * the max connection age jitter: `--max-conn-lifetime-jitter duration` * the max connection idle time: `--max-conn-idle-time duration` * the connection health check interval: `--conn-healthcheck-period duration` * the min number of connections in the pool: `--min-conns int` workload: add support for remaining pgx query modes Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available. workload: fix race condition when recording histogram data Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results. Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See `--conn-healthcheck-period`, `--min-conns`, and the `--max-conn-*` flags for details. Release note (cli change): workload now supports every [PostgreSQL query mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182) available via the underlying pgx driver.
fb7a0cc
to
bdf3f62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, and @srosenberg)
pkg/workload/pgx_helpers.go
line 332 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
With the latest fix, this logic feels like it's taking an extra step by:
- computing
numWarmupConns
from p.MaxConns- distributing numWarmupConns across pools
- allocating up to min of (p.MaxConns, distribute(numWarmupConns))
Can we invert the logic and skip a step while also making this logic more understandable. I'm thinking something like:
var warmupConnsPerPool []int if numConns == 0 { warmupConnsPerPool = make([]int, len(m.Pools)) for i, p := range m.Pools { warmupConnsPerPool[i] = int(p.Config().MaxConns) } } else { // TODO: what if numConns is > p.Config().MaxConns? warmupConnsPerPool = distribute(numConns, len(m.Pools) } var numWarmupConns int for _, n := range warmupConnsPerPool { numWarmupConns += n } warmupConns := make(chan *pgxpool.Conn, numWarmupConns) for i, p := range m.Pools { p := p for k := 0; k < warmupConnsPerPool[i]; k++ { g.Go(func() error { conn, err := p.Acquire(warmupCtx) if err != nil { return err } warmupConns <- conn return nil }) } }
Good suggestion. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)
bors r+ |
Build succeeded: |
This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition.
workload: add flags to manage the age and lifecycle of connection pool
Add flags to all workload types to specify:
--max-conn-lifetime duration
--max-conn-lifetime-jitter duration
--max-conn-idle-time duration
--conn-healthcheck-period duration
--min-conns int
workload: add support for remaining pgx query modes
Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available.
workload: fix race condition when recording histogram data
Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results.
Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See
--conn-healthcheck-period
,--min-conns
, and the--max-conn-*
flags for details.Release note (cli change): workload now supports every PostgreSQL query mode available via the underlying pgx driver.