Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: add elastic CPU control to CDC event processing #92764

Conversation

jayshrivastava
Copy link
Contributor

@jayshrivastava jayshrivastava commented Nov 30, 2022

Backport 2/3 commits from #91554.
Also add a new commit which disables the feature by default.
Also verified that CI passes with the last commit omitted (ie. with CPU control enabled)

/cc https://github.com/orgs/cockroachdb/teams/release

admission: make Pacer type available in SQL server config

Currently, the Pacer type is only used within KV, but will be used by SQL
in future changes. For example, code for encoding/decoding CDC events resides
in distSQL and is CPU intensive, so there is a plan to integrate admission
control to it in (#90089).
This change makes the Pacer type available to the SQL layer via the
execinfra.ServerConfig.

Because the Pacer was previously only used by KV, it lived in the kvadmission
package. Since this change makes it available outside of KV, it is moved to
the admission package.

Furthermore, this change adds a new method,
ElasticCPUGrantCoordinator.NewPacer, to instantiate new Pacer structs.
Since the ElasticCPUGrantCoordinator implements several features not relevant
to SQL, this change passes the coordinator to the SQL server config as
the interface PacerMaker, which makes only the NewPacer method accessible.

Currently tenant servers do not create grant coordinators for admission
control. This change retains that behavior, except it passes a nil
ElasticCPUGrandCoordinator which creates nil/noop Pacers. Adding these
coordinators to tenant servers is a change outside the scope of this commit and
is left as a TODO.

Release note: None

cdc: add elastic CPU control to CDC event processing

Previously, the CPU-bound work of CDC event processing (encoding /
decoding rows) had the potential to consume a lot of CPU and
disrupt foreground SQL traffic. This changes adds elastic CPU control
to event processing so that it does not use excessive CPU and
starve foreground traffic.

This change also adds two new, non-public cluster settings, which control
enabling/disabling CPU control for CDC event processing and controlling
the requested grant size measured in CPU time.

Fixes: #90089

Release note: None

cdc: disable event consumer elastic cpu control by default

This commit is being backported to 22.2, so this setting is
being disabled by default to error on the side of caution.

Release note: None

Release justification: This change reduces SQL latency spikes / increases caused by changefeeds. This change is disabled by default behind a cluster setting.

Currently, the Pacer type is only used within KV, but will be used by SQL
in future changes. For example, code for encoding/decoding CDC events resides
in distSQL and is CPU intensive, so there is a plan to integrate admission
control to it in (cockroachdb#90089).
This change makes the Pacer type available to the SQL layer via the
`execinfra.ServerConfig`.

Because the Pacer was previously only used by KV, it lived in the `kvadmission`
package. Since this change makes it available outside of KV, it is moved to
the `admission` package.

Furthermore, this change adds a new method,
`ElasticCPUGrantCoordinator.NewPacer`, to instantiate new Pacer structs.
Since the `ElasticCPUGrantCoordinator` implements several features not relevant
to SQL, this change passes the coordinator to the SQL server config as
the interface `PacerMaker`, which makes only the `NewPacer` method accessible.

Currently tenant servers do not create grant coordinators for admission
control. This change retains that behavior, except it passes a `nil
ElasticCPUGrandCoordinator` which creates `nil`/noop Pacers. Adding these
coordinators to tenant servers is a change outside the scope of this commit and
is left as a `TODO`.

Release note: None
Previously, the CPU-bound work of CDC event processing (encoding /
decoding rows) had the potential to consume a lot of CPU and
disrupt foreground SQL traffic. This changes adds elastic CPU control
to event processing so that it does not use excessive CPU and
starve foreground traffic.

This change also adds a new, non-public cluster setting, which controls
enabling/disabling CPU control for CDC event processing and controlling
the requested grant size measured in CPU time.

Fixes: cockroachdb#90089

Release note: None
This commit is being backported to 22.2, so this setting is
being disabled by default to error on the side of caution.

Release note: None
@blathers-crl
Copy link

blathers-crl bot commented Nov 30, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jayshrivastava jayshrivastava changed the title release-22.2: add elastic cpu control release-22.2: add elastic CPU control to CDC event processing Nov 30, 2022
@jayshrivastava jayshrivastava marked this pull request as ready for review December 1, 2022 14:54
@jayshrivastava jayshrivastava requested review from a team as code owners December 1, 2022 14:54
@jayshrivastava jayshrivastava requested a review from a team December 1, 2022 14:54
@jayshrivastava jayshrivastava requested review from a team as code owners December 1, 2022 14:54
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayshrivastava jayshrivastava merged commit 163e9ec into cockroachdb:release-22.2 Dec 1, 2022
@jayshrivastava jayshrivastava deleted the backport-elastic-nprocs branch December 1, 2022 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants