Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: add server.max_connections_per_gateway public cluster setting #76401

Merged
merged 1 commit into from
Feb 19, 2022
Merged

pgwire: add server.max_connections_per_gateway public cluster setting #76401

merged 1 commit into from
Feb 19, 2022

Conversation

ecwall
Copy link
Contributor

@ecwall ecwall commented Feb 10, 2022

This setting specifies a maximum number of connections that a server can have open at any given time.
<0 - Connections are unlimited (existing behavior)
=0 - Connections are disabled

0 - Connections are limited
If a new non-superuser connection would exceed this limit, the same error message is
returned as postgres: "sorry, too many connections" with the 53300 error code
that corresponds to "too many connections".

Release note (ops change): An off-by-default server.max_connections_per_gateway cluster
setting has been added to limit the maximum number of connections to a server.

@ecwall ecwall requested a review from a team February 10, 2022 21:35
@ecwall ecwall requested review from a team as code owners February 10, 2022 21:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't review the test code yet -- will get to that later

overall looking good!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/admin.go, line 24 at r1 (raw file):

// HasAdminRole is a helper function to determine if a user is an admin using
// the given internal executor.
func HasAdminRole(ctx context.Context, ie *InternalExecutor, user security.SQLUsername) (bool, error) {

nit: see my comment later -- this part of the refactor won't be needed, so we can put this back where it was


pkg/sql/authorization.go, line 592 at r1 (raw file):

const AuthAuditingClusterSettingName = "server.auth_log.sql_sessions.enabled"

const MaxNumConnectionsClusterSettingName = "server.max_connections"

nit: i think the name should make it more clear that the setting is per gateway node. maybe server.max_connections_per_gateway


pkg/sql/pgwire/server.go, line 94 at r1 (raw file):

	settings.SystemOnly,
	sql.MaxNumConnectionsClusterSettingName,
	"the maximum number of SQL connections to the server allowed at a given time, none if 0, unlimited if < 0",

i think it would make more sense as a NonNegativeInt setting (search the code for settings.NonNegativeInt)

then 0 can mean "disabled"

(i just don't think there's any reason to configure it to allow 0 connections)


pkg/sql/pgwire/server.go, line 98 at r1 (raw file):

).WithPublic()

var allowedConnectionCount int64 = 0

this should be a field of the sql.Server struct


pkg/sql/pgwire/server.go, line 722 at r1 (raw file):

	hbaConf, identMap := s.GetAuthenticationConfiguration()

	postAuthHook := func() error {

it doesn't feel that natural to me to have this be done with a hook. also, it doesn't feel like this PR should be thought of as part of the "auth" phase -- in my mind it just depends on the auth phase.

it seems easier to understand if we just do it in the part of the code after we call handleAuthentication:

if connCloseAuthHandler, retErr = c.handleAuthentication(

and the counter could be decremented in the defer block right before that

defer func() {


pkg/sql/pgwire/server.go, line 723 at r1 (raw file):

	postAuthHook := func() error {
		if hasAdminRole, err := sql.HasAdminRole(ctx, s.execCfg.InternalExecutor, sArgs.User); err != nil {

since that earlier PR was first created, we now have a better way without this HasAdminRole call. that call might end up being expensive, since it delegates out to the internal executor to run a new query.

instead, now we can look at sArgs.IsSuperUser (superuser is the same thing as having admin role)


pkg/sql/pgwire/server.go, line 778 at r1 (raw file):

// increments the int64 at addr by inc if <= max
func incIfLte(addr *int64, inc int64, max int64) bool {

i think this would work better as a function on the sql.Server like

func (s *Server) incrementConnectionCount( ... ) bool {

pkg/sql/pgwire/server.go, line 780 at r1 (raw file):

func incIfLte(addr *int64, inc int64, max int64) bool {
	// try until it succeeds uncontested or fails
	for {

i suppose the disadvantage of this approach is that if there's a huge spike in the rate of connections, this loop would spin a lot. i think it's more conventional that we use a syncutil.Mutex to protect concurrent fields, so that if there's contention, the thread sleeps instead of spinning. (then we don't need to use atomic either)


pkg/sql/pgwire/server.go, line 784 at r1 (raw file):

		prev := atomic.LoadInt64(addr)
		// increment
		next := prev + inc

is inc necessary as a parameter? it's always going to be 1 right?


pkg/cli/flags.go, line 481 at r1 (raw file):

		// More server flags.

		intFlag(f, &baseCfg.MaxSQLConns, cliflags.ListenMaxSQLConns)

I think the CLI changes might be here by accident - they should be removed


pkg/cli/cliflags/flags.go, line 552 at r1 (raw file):

	}

	ListenMaxSQLConns = FlagInfo{

I think the CLI changes might be here by accident - they should be removed

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/server.go, line 780 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i suppose the disadvantage of this approach is that if there's a huge spike in the rate of connections, this loop would spin a lot. i think it's more conventional that we use a syncutil.Mutex to protect concurrent fields, so that if there's contention, the thread sleeps instead of spinning. (then we don't need to use atomic either)

Switched it out for a mutex.


pkg/cli/flags.go, line 481 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

I think the CLI changes might be here by accident - they should be removed

Done.


pkg/cli/cliflags/flags.go, line 552 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

I think the CLI changes might be here by accident - they should be removed

Done.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/server.go, line 94 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think it would make more sense as a NonNegativeInt setting (search the code for settings.NonNegativeInt)

then 0 can mean "disabled"

(i just don't think there's any reason to configure it to allow 0 connections)

To allow only super user access? Or is that not a use case?

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/admin.go, line 24 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: see my comment later -- this part of the refactor won't be needed, so we can put this back where it was

Done.


pkg/sql/authorization.go, line 592 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think the name should make it more clear that the setting is per gateway node. maybe server.max_connections_per_gateway

Done.


pkg/sql/pgwire/server.go, line 723 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

since that earlier PR was first created, we now have a better way without this HasAdminRole call. that call might end up being expensive, since it delegates out to the internal executor to run a new query.

instead, now we can look at sArgs.IsSuperUser (superuser is the same thing as having admin role)

Done.


pkg/sql/pgwire/server.go, line 784 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is inc necessary as a parameter? it's always going to be 1 right?

No longer applicable with the mutex.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/server.go, line 98 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

this should be a field of the sql.Server struct

Done.


pkg/sql/pgwire/server.go, line 778 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think this would work better as a function on the sql.Server like

func (s *Server) incrementConnectionCount( ... ) bool {

Done.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/pgwire/server.go, line 722 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

it doesn't feel that natural to me to have this be done with a hook. also, it doesn't feel like this PR should be thought of as part of the "auth" phase -- in my mind it just depends on the auth phase.

it seems easier to understand if we just do it in the part of the code after we call handleAuthentication:

if connCloseAuthHandler, retErr = c.handleAuthentication(

and the counter could be decremented in the defer block right before that

defer func() {

I split up handleAuthentication and authOKMessage so that I could call checkMaxConnections between them.

@ecwall ecwall requested a review from rafiss February 11, 2022 21:00
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i think this is getting close

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/authorization.go, line 592 at r5 (raw file):

const AuthAuditingClusterSettingName = "server.auth_log.sql_sessions.enabled"

// MaxNumConnectionsClusterSettingName is the maximum number of SQL connections to the server allowed at a given time.

nit: can you move this into the pgwire package (could be right next to where the setting itself is defined)


pkg/sql/conn_executor.go, line 299 at r5 (raw file):

	TelemetryLoggingMetrics *TelemetryLoggingMetrics

	allowedConnectionCount struct {

nit: the pattern we use for protecting a field with a mutex is:

mu struct {
  syncutil.Mutex
  allowedConnectionCount int64
}

also, i'd probably rename this to currentConnectionCount. to me, allowedConnectionCount sounds like it would be the name of the setting, not the current value.


pkg/sql/conn_executor.go, line 670 at r5 (raw file):

// IncAllowedConnectionCount increments allowedConnectionCount.
func (s *Server) IncAllowedConnectionCount() {

nit: i think it's better to avoid abbreviations if we can. also, similar to my earlier comment, i don't think "allowed" is the right word here. maybe name it IncrementConnecctionCount


pkg/sql/conn_executor.go, line 674 at r5 (raw file):

	allowedConnectionCount.mu.Lock()
	allowedConnectionCount.value++
	allowedConnectionCount.mu.Unlock()

with the refactor mentioned above, this can be rewritten to:

s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++

(using the defer helps protect against a deadlock in case there's an error or panic. not likely to happen in this case, but it's a useful convention to apply when taking a mutex)

same for below functions


pkg/sql/conn_executor.go, line 678 at r5 (raw file):

// DecAllowedConnectionCount decrements allowedConnectionCount.
func (s *Server) DecAllowedConnectionCount() {

nit: maybe name it DecrementConnecctionCount


pkg/sql/conn_executor.go, line 686 at r5 (raw file):

// IncAllowedConnectionCountIfLt increments allowedConnectionCount and returns true if allowedConnectionCount < max.
func (s *Server) IncAllowedConnectionCountIfLt(max int64) bool {

nit: maybe name it IncrementConnectionCountIfLessThan


pkg/sql/pgwire/auth.go, line 83 at r5 (raw file):

}

func (c *conn) sendError(ctx context.Context, execCfg *sql.ExecutorConfig, err error) {

nit: could make more sense to define this in pkg/sql/pgwire/conn.go


pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):

func (c *conn) sendError(ctx context.Context, execCfg *sql.ExecutorConfig, err error) {
	if err := writeErr(ctx, &execCfg.Settings.SV, err, &c.msgBuilder, c.conn); err != nil {
		log.Error(ctx, err.Error())

i don't think we want to log this -- it would get a bit verbose. we need to send the error back to the client, but we shouldn't log an error in the server for all incorrect client behavior


pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):

		err = pgerror.WithCandidateCode(err, pgcode.InvalidAuthorizationSpecification)
		c.sendError(ctx, execCfg, err)
		return nil, err

it seems a bit more verbose this way, since now the error always needs to defined on its own line. why does sendError no longer return the error?


pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):

}

func (c *conn) checkMaxConnections(ctx context.Context, sqlServer *sql.Server) error {

nit: it could make more sense to define this in pkg/sql/pgwire/conn.go

also, could you add a brief comment describing what this function does?


pkg/sql/pgwire/auth.go, line 244 at r5 (raw file):

	}
	if !sqlServer.IncAllowedConnectionCountIfLt(maxNumConnectionsValue) {
		err := errors.WithHint(

nit: use errors.WithHintf - that way you can avoid having to use fmt.Sprintf below


pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):

				"the maximum number of allowed connections is %d and can be modified using the %s config key",
				maxNumConnectionsValue,
				MaxNumConnections.Key(),

nit: use MaxNumConnectionsClusterSettingName instead of Key()


pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):

			return
		}
		defer sqlServer.DecAllowedConnectionCount()

nit: this could go into the defer block above (so all the defer logic is in one place)


pkg/sql/pgwire/server.go, line 93 at r5 (raw file):

// MaxNumConnections is visible for testing
var MaxNumConnections = settings.RegisterIntSetting(
	settings.SystemOnly,

nit: this should be TenantWritable -- it's fine for each customer to configure this as they want


pkg/sql/pgwire/pgwire_test.go, line 246 at r5 (raw file):

}

func TestPGWireRejectsNewConnIfTooManyConns(t *testing.T) {

nit: this can be moved to conn_test.go


pkg/sql/pgwire/pgwire_test.go, line 253 at r5 (raw file):

	require.Equal(t, int64(-1), pgwire.MaxNumConnections.Get(&settings.SV))

	var (

nit: don't think we need var here. we can just initialize everything with :=


pkg/sql/pgwire/pgwire_test.go, line 258 at r5 (raw file):

			Settings: settings,
		}
		pgServer, err = serverutils.NewServer(params)

nit: we often use serverutils.StartServe() just to safe a line


pkg/sql/pgwire/pgwire_test.go, line 261 at r5 (raw file):

	)
	if err != nil {
		t.Fatal(err)

nit: change to require.NoError


pkg/sql/pgwire/pgwire_test.go, line 267 at r5 (raw file):

	defer pgServer.Stopper().Stop(ctx)

	pgwire.MaxNumConnections.Override(ctx, &settings.SV, 1)

nit: for an integration test like this we usually use db.Exec("SET CLUSTER SETTING ...") to make sure it all works as from the end-user perspective


pkg/sql/pgwire/pgwire_test.go, line 283 at r5 (raw file):

			pgServer.ServingSQLAddr(),
			t.Name(),
			url.User(user),

nit: just use url.UserPassword for all of them. no need to try to use client certs


pkg/sql/pgwire/pgwire_test.go, line 290 at r5 (raw file):

			pgURL.User = url.UserPassword(staff, staff)
		}
		defer cleanup()

should cleanup be called here? why not return it as part of the callback return value


pkg/sql/pgwire/pgwire_test.go, line 299 at r5 (raw file):

		if err != nil {
			return nil, func() {
				if closed {

nit: no need to track closed -- it's safe to call db.Close() always


pkg/sql/pgwire/pgwire_test.go, line 311 at r5 (raw file):

			}
			_ = c.Close()
			_ = db.Close()

nit: db.Close will automatically call c.Close, so we can consolidate this and the err != nil case above


pkg/sql/pgwire/pgwire_test.go, line 327 at r5 (raw file):

		t.Helper()
		require.Error(t, err)
		pqErr, ok := err.(*pq.Error)

nit: use this instead:

pgErr := (*pq.Error)(nil)
require.ErrorAs(t, err, &pqErr)
require.Equal(t, pgcode.TooManyConnecctions.String(), string(pqErr.Code), ...)

pkg/sql/pgwire/pgwire_test.go, line 340 at r5 (raw file):

	// Create a non-root admin user and a normal non-admin user.
	_, err = rootConn.ExecContext(ctx, fmt.Sprintf("CREATE USER %[1]s WITH PASSWORD %[1]s; GRANT admin TO %[1]s; CREATE USER %[2]s", staff, nonAdmin))

let's just use passwords for both

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/conn_executor.go, line 299 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: the pattern we use for protecting a field with a mutex is:

mu struct {
  syncutil.Mutex
  allowedConnectionCount int64
}

also, i'd probably rename this to currentConnectionCount. to me, allowedConnectionCount sounds like it would be the name of the setting, not the current value.

I'll update this for consistency, but it seems like a bad naming convention since it means you can only have 1 mutex per outer struct.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/conn_executor.go, line 670 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think it's better to avoid abbreviations if we can. also, similar to my earlier comment, i don't think "allowed" is the right word here. maybe name it IncrementConnecctionCount

Done.


pkg/sql/conn_executor.go, line 674 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

with the refactor mentioned above, this can be rewritten to:

s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++

(using the defer helps protect against a deadlock in case there's an error or panic. not likely to happen in this case, but it's a useful convention to apply when taking a mutex)

same for below functions

Done.


pkg/sql/conn_executor.go, line 678 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: maybe name it DecrementConnecctionCount

Done.


pkg/sql/conn_executor.go, line 686 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: maybe name it IncrementConnectionCountIfLessThan

Done.


pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i don't think we want to log this -- it would get a bit verbose. we need to send the error back to the client, but we shouldn't log an error in the server for all incorrect client behavior

err here is an error caused by writeErr failing to write the original error. It looks like before it was being ignored, but that appears like it would cause silent failures if the client couldn't be written to?

Should it be a panic in this case instead of just a log?

The error being returned here is the error being ignored.

// finishMsg attempts to write the data it has accumulated to the provided io.Writer.
// If the writeBuffer previously encountered an error since the last call to initMsg,
// or if it encounters an error while writing to w, it will return an error.
func (b *writeBuffer) finishMsg(w io.Writer) error {
	defer b.reset()
	if b.err != nil {
		return b.err
	}
	bytes := b.wrapped.Bytes()
	binary.BigEndian.PutUint32(bytes[1:5], uint32(b.wrapped.Len()-1))
	n, err := w.Write(bytes)
	b.bytecount.Inc(int64(n))
	return err
}

pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

it seems a bit more verbose this way, since now the error always needs to defined on its own line. why does sendError no longer return the error?

I was stepping through the debugger to see what was being sent and returned and this made it easier since the original code was wrapping the err, sending it, and returning it all in 1 statement.

I can change it back though, but it seems like a lot was happening on this line relative to other lines.


pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: it could make more sense to define this in pkg/sql/pgwire/conn.go

also, could you add a brief comment describing what this function does?

Should handleAuthentication and authOKMessage go there also?


pkg/sql/pgwire/auth.go, line 244 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: use errors.WithHintf - that way you can avoid having to use fmt.Sprintf below

Done.


pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: use MaxNumConnectionsClusterSettingName instead of Key()

I think that couples this code to the way that the the key is determined. If the key changes to something else then this will need to be manually updated also which is less obvious imo.


pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: this could go into the defer block above (so all the defer logic is in one place)

If it is in the top defer block then we need to share state somehow like setting a boolean here to true so that the top defer block knows to decrement.

Is that preferred over using defer right after the thing that needs to be cleaned up?


pkg/sql/pgwire/server.go, line 93 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: this should be TenantWritable -- it's fine for each customer to configure this as they want

Updated.

How do we prevent a tenant from using all of a node's resources?


pkg/sql/authorization.go, line 592 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: can you move this into the pgwire package (could be right next to where the setting itself is defined)

Done.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/server.go, line 94 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

To allow only super user access? Or is that not a use case?

Spoke about this offline: <0 will mean unlimited, 0 will mean connections are disabled


pkg/sql/pgwire/pgwire_test.go, line 246 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: this can be moved to conn_test.go

Done.


pkg/sql/pgwire/pgwire_test.go, line 253 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: don't think we need var here. we can just initialize everything with :=

Done.


pkg/sql/pgwire/pgwire_test.go, line 258 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: we often use serverutils.StartServe() just to safe a line

Done.


pkg/sql/pgwire/pgwire_test.go, line 261 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: change to require.NoError

It looks like serverutils.StartServer(...) doesn't return an error so this is no longer needed.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/pgwire_test.go, line 283 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: just use url.UserPassword for all of them. no need to try to use client certs

Done.


pkg/sql/pgwire/pgwire_test.go, line 290 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

should cleanup be called here? why not return it as part of the callback return value

cleanup can be called as soon as the url is determined (even while the connection is in use). The other "cleanup" code has to run after the connection is no longer needed.


pkg/sql/pgwire/pgwire_test.go, line 299 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: no need to track closed -- it's safe to call db.Close() always

Removed.


pkg/sql/pgwire/pgwire_test.go, line 311 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: db.Close will automatically call c.Close, so we can consolidate this and the err != nil case above

It looks like that isn't true. The connection is staying open and causing the test to fail.


pkg/sql/pgwire/pgwire_test.go, line 340 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

let's just use passwords for both

Done.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/pgwire_test.go, line 327 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: use this instead:

pgErr := (*pq.Error)(nil)
require.ErrorAs(t, err, &pqErr)
require.Equal(t, pgcode.TooManyConnecctions.String(), string(pqErr.Code), ...)

Done.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/pgwire/auth.go, line 83 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: could make more sense to define this in pkg/sql/pgwire/conn.go

Done.

@ecwall ecwall requested a review from rafiss February 17, 2022 16:18
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor nits now

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/sql/conn_executor.go, line 299 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

I'll update this for consistency, but it seems like a bad naming convention since it means you can only have 1 mutex per outer struct.

the idea is that all the protected fields go inside of the mutex.


pkg/sql/conn_executor.go, line 672 at r10 (raw file):

// IncrementConnectionCount increases connectionCount by 1.
func (s *Server) IncrementConnectionCount() {
	mu := &s.mu

nit: i think it's a bit more verbose to add a declaration for mu, but i'll leave it to your preference

compared to

s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++

pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

err here is an error caused by writeErr failing to write the original error. It looks like before it was being ignored, but that appears like it would cause silent failures if the client couldn't be written to?

Should it be a panic in this case instead of just a log?

The error being returned here is the error being ignored.

// finishMsg attempts to write the data it has accumulated to the provided io.Writer.
// If the writeBuffer previously encountered an error since the last call to initMsg,
// or if it encounters an error while writing to w, it will return an error.
func (b *writeBuffer) finishMsg(w io.Writer) error {
	defer b.reset()
	if b.err != nil {
		return b.err
	}
	bytes := b.wrapped.Bytes()
	binary.BigEndian.PutUint32(bytes[1:5], uint32(b.wrapped.Len()-1))
	n, err := w.Write(bytes)
	b.bytecount.Inc(int64(n))
	return err
}

no, a panic should only be used for errors which are (or should be) impossible to ever occur. even in those cases, an AssertionError is preferred

i think we should continue ignoring the error though. see

// We could, but do not, report server-side network errors while
// trying to send the client error. This is because clients that
// receive error payload are highly correlated with clients
// disconnecting abruptly.

if we do want to change that, it should be done outside of this PR

but even if we do start logging it, it should be a warn at most. logging an error in the server should be for cases where the server cannot fulfill the user's request

btw, this is a good page: https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/1676804870/Error+concepts+and+handling+in+CockroachDB


pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

I was stepping through the debugger to see what was being sent and returned and this made it easier since the original code was wrapping the err, sending it, and returning it all in 1 statement.

I can change it back though, but it seems like a lot was happening on this line relative to other lines.

i think it's a bit more readable the old way (and matches other parts of our codebase where we use that convention)


pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

Should handleAuthentication and authOKMessage go there also?

no, since those are auth related, they make sense to be in auth.go


pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

I think that couples this code to the way that the the key is determined. If the key changes to something else then this will need to be manually updated also which is less obvious imo.

sgtm


pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

If it is in the top defer block then we need to share state somehow like setting a boolean here to true so that the top defer block knows to decrement.

Is that preferred over using defer right after the thing that needs to be cleaned up?

good point; this only should happen after successful connect, so looks good as is


pkg/sql/pgwire/server.go, line 93 at r5 (raw file):

Previously, ecwall (Evan Wall) wrote…

Updated.

How do we prevent a tenant from using all of a node's resources?

we're safe here since SQL nodes are not shares between different tenants. so if a serverless client connects with millions of connections, they would only be using their own resources


pkg/sql/pgwire/server.go, line 100 at r10 (raw file):

	"the maximum number of non-superuser SQL connections per gateway allowed at a given time "+
		"(superusers can make unlimited connections), less than 0 is unlimited "+
		"(note: this will only limit future connection attempts and will not affect already established connections)",

nit: i think it's fine to just use multiple sentences here. e.g. take a look at the desccription of server.clock.persist_upper_bound_interval


pkg/sql/pgwire/conn_test.go, line 1814 at r10 (raw file):

		connCfg, err := pgx.ParseConfig(pgURL.String())
		require.NoError(t, err)
		conn, err := pgx.ConnectConfig(ctx, connCfg)

nit: pgx also lets you do pgx.Connect(url) to save a call


pkg/sql/pgwire/conn_test.go, line 1848 at r10 (raw file):

		conn, cleanup := openConnWithUserSuccess(admin)
		defer cleanup()
		rows, err := conn.Query(ctx, "SHOW CLUSTER SETTING server.max_connections_per_gateway")

nit: use QueryRow since this is guaranteed to return one row. (and it takes care of closing the row automatically). a common pattern is

err := conn.QueryRow(ctx, query).Scan(&maxCnonections)`
require.NoError(err)

pkg/sql/pgwire/conn_test.go, line 1881 at r10 (raw file):

	// 0 max_connections
	func() {

is the idea that each func() is a separate test case? it's definitely a good pattern -- to make it even better for the go test runner output, you can use

t.Run("test name", func(t *testing.T) {
  // ...
})

This setting specifies a maximum number of connections that a server can have
open at any given time.
<0 - Connections are unlimited (existing behavior)
=0 - Connections are disabled
>0 - Connections are limited
If a new non-superuser connection would exceed this limit, the same error
message is returned as postgres: "sorry, too many connections" with the 53300
error code that corresponds to "too many connections".

Release note (ops change): An off-by-default server.max_connections cluster
setting has been added to limit the maximum number of connections to a server.
Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/conn_executor.go, line 672 at r10 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think it's a bit more verbose to add a declaration for mu, but i'll leave it to your preference

compared to

s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++

I thought repeating s.mu was less dry, but it looks like it's being done that way everywhere so I'll change it for consistency.


pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

no, a panic should only be used for errors which are (or should be) impossible to ever occur. even in those cases, an AssertionError is preferred

i think we should continue ignoring the error though. see

// We could, but do not, report server-side network errors while
// trying to send the client error. This is because clients that
// receive error payload are highly correlated with clients
// disconnecting abruptly.

if we do want to change that, it should be done outside of this PR

but even if we do start logging it, it should be a warn at most. logging an error in the server should be for cases where the server cannot fulfill the user's request

btw, this is a good page: https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/1676804870/Error+concepts+and+handling+in+CockroachDB

Changed it back. I'll include that comment here also.


pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think it's a bit more readable the old way (and matches other parts of our codebase where we use that convention)

Done.


pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

no, since those are auth related, they make sense to be in auth.go

Done.


pkg/sql/pgwire/server.go, line 100 at r10 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think it's fine to just use multiple sentences here. e.g. take a look at the desccription of server.clock.persist_upper_bound_interval

Done.


pkg/sql/pgwire/conn_test.go, line 1814 at r10 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: pgx also lets you do pgx.Connect(url) to save a call

Done.


pkg/sql/pgwire/conn_test.go, line 1848 at r10 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: use QueryRow since this is guaranteed to return one row. (and it takes care of closing the row automatically). a common pattern is

err := conn.QueryRow(ctx, query).Scan(&maxCnonections)`
require.NoError(err)

Done.


pkg/sql/pgwire/conn_test.go, line 1881 at r10 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is the idea that each func() is a separate test case? it's definitely a good pattern -- to make it even better for the go test runner output, you can use

t.Run("test name", func(t *testing.T) {
  // ...
})

Yeah the tests originally had a bunch of defer calls that I removed and these functions made it easier to read. Naming the cases is even better though - good to know about that.


pkg/sql/pgwire/pgwire_test.go, line 267 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: for an integration test like this we usually use db.Exec("SET CLUSTER SETTING ...") to make sure it all works as from the end-user perspective

Done.

@ecwall ecwall requested a review from rafiss February 18, 2022 14:59
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great! nice work

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)

@ecwall
Copy link
Contributor Author

ecwall commented Feb 18, 2022

bors r=rafiss

@craig
Copy link
Contributor

craig bot commented Feb 18, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 19, 2022

Build succeeded:

@rafiss rafiss changed the title pgwire: add server.max_connections public cluster setting pgwire: add server.max_connections_per_gateway public cluster setting Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants