Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: add server.cockroach_cloud.max_client_connections_per_gateway cluster setting #100200

Merged

Conversation

alyshanjahani-crl
Copy link
Collaborator

This setting specifies a maximum number of external connections that a server can have open at any given time. External is defined as connections that are not root connections. In the future, we can have the definition of external be a configurable list of users.
<0 - Connections are unlimited (existing behavior)
=0 - External connections are disabled

0 - External connections are limited
If a new connection would exceed this limit, an error message is returned as postgres: "cluster connections are limited" with the 53300 error code that corresponds to "too many connections".

This functionality is required for serverless.
Part of: https://cockroachlabs.atlassian.net/browse/CC-9288

Release note: None

@alyshanjahani-crl alyshanjahani-crl requested a review from a team March 30, 2023 19:40
@alyshanjahani-crl alyshanjahani-crl requested a review from a team as a code owner March 30, 2023 19:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @jeffswenson, and @rafiss)


-- commits line 18 at r1:
Looks like this deserves a release note.


pkg/sql/conn_executor.go line 788 at r1 (raw file):

// IncrementConnectionCount increases connectionCount by 1.
// If isRoot is true then it also increases rootConnectionCount by 1.

I recommend you make two functions (root vs non-root) side by side. Bool arguments make reading the code harder.


pkg/sql/pgwire/conn.go line 721 at r1 (raw file):

		}

		if retErr = c.checkMaxConnections(ctx, sqlServer); retErr != nil {

Q for @rafiss unrelated to this PR: should we check the max nr of connections before starting authentication? This way we can also contrain the amount of server-side authn work via the conn limit.


pkg/sql/pgwire/server.go line 104 at r1 (raw file):

// Note(alyshan): This setting is not public.
var maxNumExternalConnections = settings.RegisterIntSetting(
	settings.TenantReadOnly,

I don't understand why it's tenantreadonly. Why not tenantwritable like the other one?

(The cc control plane can still override as usual)

Copy link
Collaborator Author

@alyshanjahani-crl alyshanjahani-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @rafiss)


-- commits line 18 at r1:
I felt that this change was not intended for external users, but just Cockroach Cloud.

Do NOT write release note texts for internal changes, such as:
Functionality that is not accessible to, or intended for, external users (e.g., multi-tenant capabilities)


pkg/sql/conn_executor.go line 788 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I recommend you make two functions (root vs non-root) side by side. Bool arguments make reading the code harder.

In this case of an incoming root connection, are you imaging the caller make two calls IncrementConnectionCount and IncrementRootConnectionCount ?

Or one call to IncrementRootConnectionCount that increments both connectionCount and rootConnectionCount?


pkg/sql/pgwire/server.go line 104 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I don't understand why it's tenantreadonly. Why not tenantwritable like the other one?

(The cc control plane can still override as usual)

I do not want serverless users to be able to discover and alter this setting on their own.

CC will set this setting for individual tenants through the system tenant.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @jeffswenson, and @rafiss)


pkg/sql/conn_executor.go line 788 at r1 (raw file):

Previously, alyshanjahani-crl wrote…

In this case of an incoming root connection, are you imaging the caller make two calls IncrementConnectionCount and IncrementRootConnectionCount ?

Or one call to IncrementRootConnectionCount that increments both connectionCount and rootConnectionCount?

the latter.


pkg/sql/pgwire/server.go line 104 at r1 (raw file):

I do not want serverless users to be able to discover and alter this setting on their own.

They would not be able to alter it if CC orchestration sets it via an override. Overrides cannot be altered by users.

IMHO there is no good reason to make it TenantReadOnly since it's the same type as the other one just above.

@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from a05ec83 to bce5c02 Compare April 3, 2023 16:14
Copy link
Collaborator Author

@alyshanjahani-crl alyshanjahani-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @rafiss)


pkg/sql/conn_executor.go line 788 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

the latter.

Done.


pkg/sql/pgwire/server.go line 104 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I do not want serverless users to be able to discover and alter this setting on their own.

They would not be able to alter it if CC orchestration sets it via an override. Overrides cannot be altered by users.

IMHO there is no good reason to make it TenantReadOnly since it's the same type as the other one just above.

Oh gotchya, that makes sense to me now.

Changed to TenantWritable

@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from 7e71266 to 50d374e Compare April 4, 2023 18:18
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just had minor comments

i was wondering - is there a risk for confusion if this new setting and the old max_per_gateway setting are used at the same time? does one take precedence?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @jeffswenson, and @knz)


pkg/sql/conn_executor.go line 804 at r2 (raw file):

// DecrementConnectionCount decreases connectionCount by 1.
// If isRoot is true then it also decreases rootConnectionCount by 1.

nit: this comment seems outdated


pkg/sql/pgwire/conn.go line 721 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Q for @rafiss unrelated to this PR: should we check the max nr of connections before starting authentication? This way we can also contrain the amount of server-side authn work via the conn limit.

i believe we'd discussed this before, and decided no. this would allow someone who has no access to the cluster to spam connection attempts, and thereby lock out legitimate users from ever being able to begin authentication attempts.


pkg/sql/pgwire/server.go line 100 at r2 (raw file):

// Note(alyshan): One might suggest this cluster setting be named server.max_non_root_connections_per_gateway
// as that reflects its current behaviour of excluding root connections from being limited. However, I chose
// to use the term "external" so that this setting can be extended to exclude connections from an arbitrary

the name "external connection" is unfortunate here. that already has a definition in our product https://www.cockroachlabs.com/docs/stable/create-external-connection.html

can we just name this maxNumNonRootConnections? and name the setting server.max_non_root_connections_per_gateway?

can you say more about how you imagine extending the setting?

@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch 2 times, most recently from 5b021bf to 3fb40b6 Compare April 4, 2023 20:02
Copy link
Collaborator Author

@alyshanjahani-crl alyshanjahani-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does one take precedence?

The one im adding maxNonRoot is checked first since it is more restrictive. But they won't interfere with each other.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @rafiss)


pkg/sql/pgwire/server.go line 100 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

the name "external connection" is unfortunate here. that already has a definition in our product https://www.cockroachlabs.com/docs/stable/create-external-connection.html

can we just name this maxNumNonRootConnections? and name the setting server.max_non_root_connections_per_gateway?

can you say more about how you imagine extending the setting?

Currently when CockroachCloud performs operations on serverless clusters (like the console page listing sql users, creating databases etc, doing a CRDB version upgrade etc..) they connect as root.

The intention behind this cluster setting is to let an administrator access the cluster but limit other users. Currently the root user is this administrator. But that might not always be the case, we might have more fine grained access control and so i was imagining extending this setting one day with another setting where one can provide which users should be exempted / treated like root in this case.

I wanted to avoid a renaming of cluster setting in the future, and so i thought of maxExternalConnections.

I suppose i would be fine with server.max_non_root_connections_per_gateway, and if we ever needed to extend the functionality of the setting as i described above, we would make it explicit/clear in the description of this setting that it can permit more than just root connections via using an additional setting.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high level architecture standpoint, I agree that we should implement this in the sql server instead of the sql proxy. The one downside is adding a setting to the sql server means we need to support it O(forever) because removing the setting will be a breaking change. That means we need to take some time and make sure the setting makes sense outside of the context of CC. We also need to think about how throttling will interact with a user that has configured the setting themselves.

As for the design, I'm hesitant to add a new setting that has a behavior very similar to an existing setting, because there are already a lot of settings and I don't want to make things more confusing.

Here are a few ideas:

  1. We add a setting which defines which users are exempt from checkMaxConnections. This feature would be more useful for non CC users and we can override the two settings together
  2. We stick with the somewhat hacky sql proxy change, because we can remove the feature if we no longer want to support it the future. E.g. once we have a GUI sql shell, maybe we limit all connections at the sql proxy and only allow connections through the GUI shell.

Copy link
Collaborator Author

@alyshanjahani-crl alyshanjahani-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke w. Jeff offline, the setting being private/reserved actually addresses his concerns. With the caveat of including in the description of the setting a disclaimer/warning that this is intended to be used by Cockroach Cloud, as well as a comment in the usage of the setting in checkMaxConnections

And on a similar note, a request to include cloud somewhere in the name of the setting.

Before I go ahead and change the name of the setting again, can we agree upon a name here first?
Is there any rules around what can be the prefix of the cluster setting name @rafiss ? for ex. instead of server.max_conn... could we do cloud.max_conn... ?
If we can that my preferred cluster setting name is cloud.max_connections_per_gateway with a description like so

This setting is intended to be used by Cockroach Cloud for limiting connections to serverless clusters.
The maximum number of connections per gateway allowed at a given time
(note: this will only limit future connection attempts and will not affect already established connections).
Negative values result in unlimited number of connections. Cockroach cloud internal users (including root user) are not affected by this limit."

Alternatively: server.max_non_cloud_connections_per_gateway

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)

@rafiss
Copy link
Collaborator

rafiss commented Apr 5, 2023

what about server.max_cockroach_cloud_connections_per_gateway. (i want to use cockroach_cloud instead of cloud, since cloud is used in other parts of CRDB code to refer to the idea of connecting to resources like S3 or GCE)

the general convention is that the first part of the cluster setting name relates to the code/component that is being configured.

@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from 3fb40b6 to 0e6ff41 Compare April 6, 2023 20:38
Copy link
Collaborator Author

@alyshanjahani-crl alyshanjahani-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, i've used server.max_non_cockroach_cloud_connections_per_gateway . It is a mouthful, but to re-iterate, this is a setting that is private, only intended for cockroach cloud, and will possibly be removed in the future when we have a better way to limit connections to serverless clusters.

TFTR

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @knz)


pkg/sql/pgwire/server.go line 104 at r6 (raw file):

var maxNumNonRootConnections = settings.RegisterIntSetting(
	settings.TenantWritable,
	"server.max_non_cockroach_cloud_connections_per_gateway",

nit: it should be server.max_cockroach_cloud_connections_per_gateway

sorry for the review churn! feel free to merge after fixing this.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @knz)


pkg/sql/pgwire/server.go line 104 at r6 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: it should be server.max_cockroach_cloud_connections_per_gateway

sorry for the review churn! feel free to merge after fixing this.

well, my idea was "cockroach cloud connection" refers to someone connecting to cockroach cloud

but i see in your description that you are thinking of it differently maybe?

@jeffswenson
Copy link
Collaborator

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @knz)

pkg/sql/pgwire/server.go line 104 at r6 (raw file):

Previously, rafiss (Rafi Shamim) wrote…
well, my idea was "cockroach cloud connection" refers to someone connecting to cockroach cloud

but i see in your description that you are thinking of it differently maybe?

The intent of the setting is to differentiate between connections from the CC control plane vs connections from the CC user. What about the name server.cockroach_cloud_client_connection_limit?

@jeffswenson
Copy link
Collaborator

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @knz)

pkg/sql/pgwire/server.go line 104 at r6 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
well, my idea was "cockroach cloud connection" refers to someone connecting to cockroach cloud
but i see in your description that you are thinking of it differently maybe?

The intent of the setting is to differentiate between connections from the CC control plane vs connections from the CC user. What about the name server.cockroach_cloud_client_connection_limit?

Or maybe
server.cockroach_cloud.max_client_connections.

@rafiss
Copy link
Collaborator

rafiss commented Apr 7, 2023

server.cockroach_cloud.max_client_connections_per_gateway sounds like a good one to me. (the per_gateway part is important, i think)

@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from 0e6ff41 to 660a05a Compare April 7, 2023 23:58
… cluster setting

This setting specifies a maximum number of non root connections that a server
can have open at any given time.
<0 - Connections are unlimited (existing behavior)
=0 - Non root connections are disabled
>0 - Non root connections are limited
If a new connection would exceed this limit, an error message is
returned as postgres: "cluster connections are limited" with the 53300
error code that corresponds to "too many connections".

This functionality is required for serverless.
Part of: https://cockroachlabs.atlassian.net/browse/CC-9288

Release note: None
@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from 660a05a to d3a9bf0 Compare April 10, 2023 17:08
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks for the attention to detail

@@ -82,7 +82,11 @@ var logSessionAuth = settings.RegisterBoolSetting(
"if set, log SQL session login/disconnection events (note: may hinder performance on loaded nodes)",
false).WithPublic()

var maxNumConnections = settings.RegisterIntSetting(
// TODO(alyshan): This setting is enforcing max number of connections with superusers not being affected by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call on the TODO. we should do this one:

update the description to say "the maximum number of connections per gateway ... Superusers are not affected by this limit"

(our team can do this in a later PR if that's easier)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In spirit of breather week:

#101177

)

// maxNumNonRootConnectionsReason is used to supplement the error message for connections that denied due to
// server.max_non_root_connections_per_gateway.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: this comment refers to the old name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching, fixed

…_reason cluster setting

This setting can be used to customize the error message returned when
connections are denied due to a limit specified via
server.cockroach_cloud.max_client_connections_per_gateway.

This functionality is required for serverless. Being able to indicate to the
user why a limit is placed on their number of connections will support a better
UX. Specifically, indicating to the user which resource limits they have hit.
Part of: https://cockroachlabs.atlassian.net/browse/CC-9288

Release note: None
@alyshanjahani-crl alyshanjahani-crl force-pushed the max-external-conns-cluster-setting branch from d3a9bf0 to 02f420a Compare April 10, 2023 22:04
@rafiss rafiss changed the title pgwire: add server.max_external_connections_per_gateway cluster setting pgwire: add server.cockroach_cloud.max_client_connections_per_gateway cluster setting Apr 11, 2023
@alyshanjahani-crl
Copy link
Collaborator Author

bors r+

@craig craig bot merged commit 5fd45da into cockroachdb:master Apr 11, 2023
@craig
Copy link
Contributor

craig bot commented Apr 11, 2023

Build succeeded:

@alyshanjahani-crl alyshanjahani-crl added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Apr 11, 2023
@alyshanjahani-crl
Copy link
Collaborator Author

blathers backport 22.2 23.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants