-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: add server.max_connections_per_gateway public cluster setting #76401
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't review the test code yet -- will get to that later
overall looking good!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/admin.go, line 24 at r1 (raw file):
// HasAdminRole is a helper function to determine if a user is an admin using // the given internal executor. func HasAdminRole(ctx context.Context, ie *InternalExecutor, user security.SQLUsername) (bool, error) {
nit: see my comment later -- this part of the refactor won't be needed, so we can put this back where it was
pkg/sql/authorization.go, line 592 at r1 (raw file):
const AuthAuditingClusterSettingName = "server.auth_log.sql_sessions.enabled" const MaxNumConnectionsClusterSettingName = "server.max_connections"
nit: i think the name should make it more clear that the setting is per gateway node. maybe server.max_connections_per_gateway
pkg/sql/pgwire/server.go, line 94 at r1 (raw file):
settings.SystemOnly, sql.MaxNumConnectionsClusterSettingName, "the maximum number of SQL connections to the server allowed at a given time, none if 0, unlimited if < 0",
i think it would make more sense as a NonNegativeInt setting (search the code for settings.NonNegativeInt
)
then 0 can mean "disabled"
(i just don't think there's any reason to configure it to allow 0 connections)
pkg/sql/pgwire/server.go, line 98 at r1 (raw file):
).WithPublic() var allowedConnectionCount int64 = 0
this should be a field of the sql.Server
struct
pkg/sql/pgwire/server.go, line 722 at r1 (raw file):
hbaConf, identMap := s.GetAuthenticationConfiguration() postAuthHook := func() error {
it doesn't feel that natural to me to have this be done with a hook. also, it doesn't feel like this PR should be thought of as part of the "auth" phase -- in my mind it just depends on the auth phase.
it seems easier to understand if we just do it in the part of the code after we call handleAuthentication:
cockroach/pkg/sql/pgwire/conn.go
Line 659 in de283bf
if connCloseAuthHandler, retErr = c.handleAuthentication( |
and the counter could be decremented in the defer block right before that
cockroach/pkg/sql/pgwire/conn.go
Line 613 in de283bf
defer func() { |
pkg/sql/pgwire/server.go, line 723 at r1 (raw file):
postAuthHook := func() error { if hasAdminRole, err := sql.HasAdminRole(ctx, s.execCfg.InternalExecutor, sArgs.User); err != nil {
since that earlier PR was first created, we now have a better way without this HasAdminRole
call. that call might end up being expensive, since it delegates out to the internal executor to run a new query.
instead, now we can look at sArgs.IsSuperUser
(superuser is the same thing as having admin role)
pkg/sql/pgwire/server.go, line 778 at r1 (raw file):
// increments the int64 at addr by inc if <= max func incIfLte(addr *int64, inc int64, max int64) bool {
i think this would work better as a function on the sql.Server
like
func (s *Server) incrementConnectionCount( ... ) bool {
pkg/sql/pgwire/server.go, line 780 at r1 (raw file):
func incIfLte(addr *int64, inc int64, max int64) bool { // try until it succeeds uncontested or fails for {
i suppose the disadvantage of this approach is that if there's a huge spike in the rate of connections, this loop would spin a lot. i think it's more conventional that we use a syncutil.Mutex to protect concurrent fields, so that if there's contention, the thread sleeps instead of spinning. (then we don't need to use atomic either)
pkg/sql/pgwire/server.go, line 784 at r1 (raw file):
prev := atomic.LoadInt64(addr) // increment next := prev + inc
is inc
necessary as a parameter? it's always going to be 1
right?
pkg/cli/flags.go, line 481 at r1 (raw file):
// More server flags. intFlag(f, &baseCfg.MaxSQLConns, cliflags.ListenMaxSQLConns)
I think the CLI changes might be here by accident - they should be removed
pkg/cli/cliflags/flags.go, line 552 at r1 (raw file):
} ListenMaxSQLConns = FlagInfo{
I think the CLI changes might be here by accident - they should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/server.go, line 780 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i suppose the disadvantage of this approach is that if there's a huge spike in the rate of connections, this loop would spin a lot. i think it's more conventional that we use a syncutil.Mutex to protect concurrent fields, so that if there's contention, the thread sleeps instead of spinning. (then we don't need to use atomic either)
Switched it out for a mutex.
pkg/cli/flags.go, line 481 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
I think the CLI changes might be here by accident - they should be removed
Done.
pkg/cli/cliflags/flags.go, line 552 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
I think the CLI changes might be here by accident - they should be removed
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/server.go, line 94 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i think it would make more sense as a NonNegativeInt setting (search the code for
settings.NonNegativeInt
)then 0 can mean "disabled"
(i just don't think there's any reason to configure it to allow 0 connections)
To allow only super user access? Or is that not a use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/admin.go, line 24 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: see my comment later -- this part of the refactor won't be needed, so we can put this back where it was
Done.
pkg/sql/authorization.go, line 592 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: i think the name should make it more clear that the setting is per gateway node. maybe
server.max_connections_per_gateway
Done.
pkg/sql/pgwire/server.go, line 723 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
since that earlier PR was first created, we now have a better way without this
HasAdminRole
call. that call might end up being expensive, since it delegates out to the internal executor to run a new query.instead, now we can look at
sArgs.IsSuperUser
(superuser is the same thing as having admin role)
Done.
pkg/sql/pgwire/server.go, line 784 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
is
inc
necessary as a parameter? it's always going to be1
right?
No longer applicable with the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/server.go, line 98 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
this should be a field of the
sql.Server
struct
Done.
pkg/sql/pgwire/server.go, line 778 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i think this would work better as a function on the
sql.Server
likefunc (s *Server) incrementConnectionCount( ... ) bool {
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/pgwire/server.go, line 722 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
it doesn't feel that natural to me to have this be done with a hook. also, it doesn't feel like this PR should be thought of as part of the "auth" phase -- in my mind it just depends on the auth phase.
it seems easier to understand if we just do it in the part of the code after we call handleAuthentication:
cockroach/pkg/sql/pgwire/conn.go
Line 659 in de283bf
if connCloseAuthHandler, retErr = c.handleAuthentication( and the counter could be decremented in the defer block right before that
cockroach/pkg/sql/pgwire/conn.go
Line 613 in de283bf
defer func() {
I split up handleAuthentication
and authOKMessage
so that I could call checkMaxConnections
between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, i think this is getting close
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/authorization.go, line 592 at r5 (raw file):
const AuthAuditingClusterSettingName = "server.auth_log.sql_sessions.enabled" // MaxNumConnectionsClusterSettingName is the maximum number of SQL connections to the server allowed at a given time.
nit: can you move this into the pgwire package (could be right next to where the setting itself is defined)
pkg/sql/conn_executor.go, line 299 at r5 (raw file):
TelemetryLoggingMetrics *TelemetryLoggingMetrics allowedConnectionCount struct {
nit: the pattern we use for protecting a field with a mutex is:
mu struct {
syncutil.Mutex
allowedConnectionCount int64
}
also, i'd probably rename this to currentConnectionCount
. to me, allowedConnectionCount
sounds like it would be the name of the setting, not the current value.
pkg/sql/conn_executor.go, line 670 at r5 (raw file):
// IncAllowedConnectionCount increments allowedConnectionCount. func (s *Server) IncAllowedConnectionCount() {
nit: i think it's better to avoid abbreviations if we can. also, similar to my earlier comment, i don't think "allowed" is the right word here. maybe name it IncrementConnecctionCount
pkg/sql/conn_executor.go, line 674 at r5 (raw file):
allowedConnectionCount.mu.Lock() allowedConnectionCount.value++ allowedConnectionCount.mu.Unlock()
with the refactor mentioned above, this can be rewritten to:
s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++
(using the defer helps protect against a deadlock in case there's an error or panic. not likely to happen in this case, but it's a useful convention to apply when taking a mutex)
same for below functions
pkg/sql/conn_executor.go, line 678 at r5 (raw file):
// DecAllowedConnectionCount decrements allowedConnectionCount. func (s *Server) DecAllowedConnectionCount() {
nit: maybe name it DecrementConnecctionCount
pkg/sql/conn_executor.go, line 686 at r5 (raw file):
// IncAllowedConnectionCountIfLt increments allowedConnectionCount and returns true if allowedConnectionCount < max. func (s *Server) IncAllowedConnectionCountIfLt(max int64) bool {
nit: maybe name it IncrementConnectionCountIfLessThan
pkg/sql/pgwire/auth.go, line 83 at r5 (raw file):
} func (c *conn) sendError(ctx context.Context, execCfg *sql.ExecutorConfig, err error) {
nit: could make more sense to define this in pkg/sql/pgwire/conn.go
pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):
func (c *conn) sendError(ctx context.Context, execCfg *sql.ExecutorConfig, err error) { if err := writeErr(ctx, &execCfg.Settings.SV, err, &c.msgBuilder, c.conn); err != nil { log.Error(ctx, err.Error())
i don't think we want to log this -- it would get a bit verbose. we need to send the error back to the client, but we shouldn't log an error in the server for all incorrect client behavior
pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):
err = pgerror.WithCandidateCode(err, pgcode.InvalidAuthorizationSpecification) c.sendError(ctx, execCfg, err) return nil, err
it seems a bit more verbose this way, since now the error always needs to defined on its own line. why does sendError
no longer return the error?
pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):
} func (c *conn) checkMaxConnections(ctx context.Context, sqlServer *sql.Server) error {
nit: it could make more sense to define this in pkg/sql/pgwire/conn.go
also, could you add a brief comment describing what this function does?
pkg/sql/pgwire/auth.go, line 244 at r5 (raw file):
} if !sqlServer.IncAllowedConnectionCountIfLt(maxNumConnectionsValue) { err := errors.WithHint(
nit: use errors.WithHintf
- that way you can avoid having to use fmt.Sprintf
below
pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):
"the maximum number of allowed connections is %d and can be modified using the %s config key", maxNumConnectionsValue, MaxNumConnections.Key(),
nit: use MaxNumConnectionsClusterSettingName instead of Key()
pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):
return } defer sqlServer.DecAllowedConnectionCount()
nit: this could go into the defer block above (so all the defer logic is in one place)
pkg/sql/pgwire/server.go, line 93 at r5 (raw file):
// MaxNumConnections is visible for testing var MaxNumConnections = settings.RegisterIntSetting( settings.SystemOnly,
nit: this should be TenantWritable
-- it's fine for each customer to configure this as they want
pkg/sql/pgwire/pgwire_test.go, line 246 at r5 (raw file):
} func TestPGWireRejectsNewConnIfTooManyConns(t *testing.T) {
nit: this can be moved to conn_test.go
pkg/sql/pgwire/pgwire_test.go, line 253 at r5 (raw file):
require.Equal(t, int64(-1), pgwire.MaxNumConnections.Get(&settings.SV)) var (
nit: don't think we need var
here. we can just initialize everything with :=
pkg/sql/pgwire/pgwire_test.go, line 258 at r5 (raw file):
Settings: settings, } pgServer, err = serverutils.NewServer(params)
nit: we often use serverutils.StartServe()
just to safe a line
pkg/sql/pgwire/pgwire_test.go, line 261 at r5 (raw file):
) if err != nil { t.Fatal(err)
nit: change to require.NoError
pkg/sql/pgwire/pgwire_test.go, line 267 at r5 (raw file):
defer pgServer.Stopper().Stop(ctx) pgwire.MaxNumConnections.Override(ctx, &settings.SV, 1)
nit: for an integration test like this we usually use db.Exec("SET CLUSTER SETTING ...")
to make sure it all works as from the end-user perspective
pkg/sql/pgwire/pgwire_test.go, line 283 at r5 (raw file):
pgServer.ServingSQLAddr(), t.Name(), url.User(user),
nit: just use url.UserPassword
for all of them. no need to try to use client certs
pkg/sql/pgwire/pgwire_test.go, line 290 at r5 (raw file):
pgURL.User = url.UserPassword(staff, staff) } defer cleanup()
should cleanup be called here? why not return it as part of the callback return value
pkg/sql/pgwire/pgwire_test.go, line 299 at r5 (raw file):
if err != nil { return nil, func() { if closed {
nit: no need to track closed -- it's safe to call db.Close() always
pkg/sql/pgwire/pgwire_test.go, line 311 at r5 (raw file):
} _ = c.Close() _ = db.Close()
nit: db.Close will automatically call c.Close, so we can consolidate this and the err != nil
case above
pkg/sql/pgwire/pgwire_test.go, line 327 at r5 (raw file):
t.Helper() require.Error(t, err) pqErr, ok := err.(*pq.Error)
nit: use this instead:
pgErr := (*pq.Error)(nil)
require.ErrorAs(t, err, &pqErr)
require.Equal(t, pgcode.TooManyConnecctions.String(), string(pqErr.Code), ...)
pkg/sql/pgwire/pgwire_test.go, line 340 at r5 (raw file):
// Create a non-root admin user and a normal non-admin user. _, err = rootConn.ExecContext(ctx, fmt.Sprintf("CREATE USER %[1]s WITH PASSWORD %[1]s; GRANT admin TO %[1]s; CREATE USER %[2]s", staff, nonAdmin))
let's just use passwords for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/conn_executor.go, line 299 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: the pattern we use for protecting a field with a mutex is:
mu struct { syncutil.Mutex allowedConnectionCount int64 }
also, i'd probably rename this to
currentConnectionCount
. to me,allowedConnectionCount
sounds like it would be the name of the setting, not the current value.
I'll update this for consistency, but it seems like a bad naming convention since it means you can only have 1 mutex per outer struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/conn_executor.go, line 670 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: i think it's better to avoid abbreviations if we can. also, similar to my earlier comment, i don't think "allowed" is the right word here. maybe name it
IncrementConnecctionCount
Done.
pkg/sql/conn_executor.go, line 674 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
with the refactor mentioned above, this can be rewritten to:
s.mu.Lock() defer s.mu.Unlock() s.mu.connectionCount++
(using the defer helps protect against a deadlock in case there's an error or panic. not likely to happen in this case, but it's a useful convention to apply when taking a mutex)
same for below functions
Done.
pkg/sql/conn_executor.go, line 678 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: maybe name it
DecrementConnecctionCount
Done.
pkg/sql/conn_executor.go, line 686 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: maybe name it
IncrementConnectionCountIfLessThan
Done.
pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i don't think we want to log this -- it would get a bit verbose. we need to send the error back to the client, but we shouldn't log an error in the server for all incorrect client behavior
err
here is an error caused by writeErr
failing to write the original error. It looks like before it was being ignored, but that appears like it would cause silent failures if the client couldn't be written to?
Should it be a panic in this case instead of just a log?
The error being returned here is the error being ignored.
// finishMsg attempts to write the data it has accumulated to the provided io.Writer.
// If the writeBuffer previously encountered an error since the last call to initMsg,
// or if it encounters an error while writing to w, it will return an error.
func (b *writeBuffer) finishMsg(w io.Writer) error {
defer b.reset()
if b.err != nil {
return b.err
}
bytes := b.wrapped.Bytes()
binary.BigEndian.PutUint32(bytes[1:5], uint32(b.wrapped.Len()-1))
n, err := w.Write(bytes)
b.bytecount.Inc(int64(n))
return err
}
pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
it seems a bit more verbose this way, since now the error always needs to defined on its own line. why does
sendError
no longer return the error?
I was stepping through the debugger to see what was being sent and returned and this made it easier since the original code was wrapping the err, sending it, and returning it all in 1 statement.
I can change it back though, but it seems like a lot was happening on this line relative to other lines.
pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: it could make more sense to define this in
pkg/sql/pgwire/conn.go
also, could you add a brief comment describing what this function does?
Should handleAuthentication and authOKMessage go there also?
pkg/sql/pgwire/auth.go, line 244 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use
errors.WithHintf
- that way you can avoid having to usefmt.Sprintf
below
Done.
pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use MaxNumConnectionsClusterSettingName instead of
Key()
I think that couples this code to the way that the the key is determined. If the key changes to something else then this will need to be manually updated also which is less obvious imo.
pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: this could go into the defer block above (so all the defer logic is in one place)
If it is in the top defer block then we need to share state somehow like setting a boolean here to true so that the top defer block knows to decrement.
Is that preferred over using defer right after the thing that needs to be cleaned up?
pkg/sql/pgwire/server.go, line 93 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: this should be
TenantWritable
-- it's fine for each customer to configure this as they want
Updated.
How do we prevent a tenant from using all of a node's resources?
pkg/sql/authorization.go, line 592 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: can you move this into the pgwire package (could be right next to where the setting itself is defined)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/server.go, line 94 at r1 (raw file):
Previously, ecwall (Evan Wall) wrote…
To allow only super user access? Or is that not a use case?
Spoke about this offline: <0 will mean unlimited, 0 will mean connections are disabled
pkg/sql/pgwire/pgwire_test.go, line 246 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: this can be moved to conn_test.go
Done.
pkg/sql/pgwire/pgwire_test.go, line 253 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: don't think we need
var
here. we can just initialize everything with:=
Done.
pkg/sql/pgwire/pgwire_test.go, line 258 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: we often use
serverutils.StartServe()
just to safe a line
Done.
pkg/sql/pgwire/pgwire_test.go, line 261 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: change to
require.NoError
It looks like serverutils.StartServer(...)
doesn't return an error so this is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/pgwire_test.go, line 283 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: just use
url.UserPassword
for all of them. no need to try to use client certs
Done.
pkg/sql/pgwire/pgwire_test.go, line 290 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
should cleanup be called here? why not return it as part of the callback return value
cleanup
can be called as soon as the url is determined (even while the connection is in use). The other "cleanup" code has to run after the connection is no longer needed.
pkg/sql/pgwire/pgwire_test.go, line 299 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: no need to track closed -- it's safe to call db.Close() always
Removed.
pkg/sql/pgwire/pgwire_test.go, line 311 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: db.Close will automatically call c.Close, so we can consolidate this and the
err != nil
case above
It looks like that isn't true. The connection is staying open and causing the test to fail.
pkg/sql/pgwire/pgwire_test.go, line 340 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
let's just use passwords for both
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/pgwire_test.go, line 327 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use this instead:
pgErr := (*pq.Error)(nil) require.ErrorAs(t, err, &pqErr) require.Equal(t, pgcode.TooManyConnecctions.String(), string(pqErr.Code), ...)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/pgwire/auth.go, line 83 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: could make more sense to define this in pkg/sql/pgwire/conn.go
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just minor nits now
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/conn_executor.go, line 299 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
I'll update this for consistency, but it seems like a bad naming convention since it means you can only have 1 mutex per outer struct.
the idea is that all the protected fields go inside of the mutex.
pkg/sql/conn_executor.go, line 672 at r10 (raw file):
// IncrementConnectionCount increases connectionCount by 1. func (s *Server) IncrementConnectionCount() { mu := &s.mu
nit: i think it's a bit more verbose to add a declaration for mu
, but i'll leave it to your preference
compared to
s.mu.Lock()
defer s.mu.Unlock()
s.mu.connectionCount++
pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
err
here is an error caused bywriteErr
failing to write the original error. It looks like before it was being ignored, but that appears like it would cause silent failures if the client couldn't be written to?Should it be a panic in this case instead of just a log?
The error being returned here is the error being ignored.
// finishMsg attempts to write the data it has accumulated to the provided io.Writer. // If the writeBuffer previously encountered an error since the last call to initMsg, // or if it encounters an error while writing to w, it will return an error. func (b *writeBuffer) finishMsg(w io.Writer) error { defer b.reset() if b.err != nil { return b.err } bytes := b.wrapped.Bytes() binary.BigEndian.PutUint32(bytes[1:5], uint32(b.wrapped.Len()-1)) n, err := w.Write(bytes) b.bytecount.Inc(int64(n)) return err }
no, a panic should only be used for errors which are (or should be) impossible to ever occur. even in those cases, an AssertionError is preferred
i think we should continue ignoring the error though. see
cockroach/pkg/sql/pgwire/server.go
Lines 1242 to 1245 in a6ef4b0
// We could, but do not, report server-side network errors while | |
// trying to send the client error. This is because clients that | |
// receive error payload are highly correlated with clients | |
// disconnecting abruptly. |
if we do want to change that, it should be done outside of this PR
but even if we do start logging it, it should be a warn
at most. logging an error
in the server should be for cases where the server cannot fulfill the user's request
btw, this is a good page: https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/1676804870/Error+concepts+and+handling+in+CockroachDB
pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
I was stepping through the debugger to see what was being sent and returned and this made it easier since the original code was wrapping the err, sending it, and returning it all in 1 statement.
I can change it back though, but it seems like a lot was happening on this line relative to other lines.
i think it's a bit more readable the old way (and matches other parts of our codebase where we use that convention)
pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
Should handleAuthentication and authOKMessage go there also?
no, since those are auth related, they make sense to be in auth.go
pkg/sql/pgwire/auth.go, line 251 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
I think that couples this code to the way that the the key is determined. If the key changes to something else then this will need to be manually updated also which is less obvious imo.
sgtm
pkg/sql/pgwire/conn.go, line 669 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
If it is in the top defer block then we need to share state somehow like setting a boolean here to true so that the top defer block knows to decrement.
Is that preferred over using defer right after the thing that needs to be cleaned up?
good point; this only should happen after successful connect, so looks good as is
pkg/sql/pgwire/server.go, line 93 at r5 (raw file):
Previously, ecwall (Evan Wall) wrote…
Updated.
How do we prevent a tenant from using all of a node's resources?
we're safe here since SQL nodes are not shares between different tenants. so if a serverless client connects with millions of connections, they would only be using their own resources
pkg/sql/pgwire/server.go, line 100 at r10 (raw file):
"the maximum number of non-superuser SQL connections per gateway allowed at a given time "+ "(superusers can make unlimited connections), less than 0 is unlimited "+ "(note: this will only limit future connection attempts and will not affect already established connections)",
nit: i think it's fine to just use multiple sentences here. e.g. take a look at the desccription of server.clock.persist_upper_bound_interval
pkg/sql/pgwire/conn_test.go, line 1814 at r10 (raw file):
connCfg, err := pgx.ParseConfig(pgURL.String()) require.NoError(t, err) conn, err := pgx.ConnectConfig(ctx, connCfg)
nit: pgx also lets you do pgx.Connect(url)
to save a call
pkg/sql/pgwire/conn_test.go, line 1848 at r10 (raw file):
conn, cleanup := openConnWithUserSuccess(admin) defer cleanup() rows, err := conn.Query(ctx, "SHOW CLUSTER SETTING server.max_connections_per_gateway")
nit: use QueryRow
since this is guaranteed to return one row. (and it takes care of closing the row automatically). a common pattern is
err := conn.QueryRow(ctx, query).Scan(&maxCnonections)`
require.NoError(err)
pkg/sql/pgwire/conn_test.go, line 1881 at r10 (raw file):
// 0 max_connections func() {
is the idea that each func() is a separate test case? it's definitely a good pattern -- to make it even better for the go test runner output, you can use
t.Run("test name", func(t *testing.T) {
// ...
})
This setting specifies a maximum number of connections that a server can have open at any given time. <0 - Connections are unlimited (existing behavior) =0 - Connections are disabled >0 - Connections are limited If a new non-superuser connection would exceed this limit, the same error message is returned as postgres: "sorry, too many connections" with the 53300 error code that corresponds to "too many connections". Release note (ops change): An off-by-default server.max_connections cluster setting has been added to limit the maximum number of connections to a server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/conn_executor.go, line 672 at r10 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: i think it's a bit more verbose to add a declaration for
mu
, but i'll leave it to your preferencecompared to
s.mu.Lock() defer s.mu.Unlock() s.mu.connectionCount++
I thought repeating s.mu
was less dry, but it looks like it's being done that way everywhere so I'll change it for consistency.
pkg/sql/pgwire/auth.go, line 85 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
no, a panic should only be used for errors which are (or should be) impossible to ever occur. even in those cases, an AssertionError is preferred
i think we should continue ignoring the error though. see
cockroach/pkg/sql/pgwire/server.go
Lines 1242 to 1245 in a6ef4b0
// We could, but do not, report server-side network errors while // trying to send the client error. This is because clients that // receive error payload are highly correlated with clients // disconnecting abruptly. if we do want to change that, it should be done outside of this PR
but even if we do start logging it, it should be a
warn
at most. logging anerror
in the server should be for cases where the server cannot fulfill the user's requestbtw, this is a good page: https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/1676804870/Error+concepts+and+handling+in+CockroachDB
Changed it back. I'll include that comment here also.
pkg/sql/pgwire/auth.go, line 111 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i think it's a bit more readable the old way (and matches other parts of our codebase where we use that convention)
Done.
pkg/sql/pgwire/auth.go, line 230 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
no, since those are auth related, they make sense to be in auth.go
Done.
pkg/sql/pgwire/server.go, line 100 at r10 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: i think it's fine to just use multiple sentences here. e.g. take a look at the desccription of
server.clock.persist_upper_bound_interval
Done.
pkg/sql/pgwire/conn_test.go, line 1814 at r10 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: pgx also lets you do
pgx.Connect(url)
to save a call
Done.
pkg/sql/pgwire/conn_test.go, line 1848 at r10 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use
QueryRow
since this is guaranteed to return one row. (and it takes care of closing the row automatically). a common pattern iserr := conn.QueryRow(ctx, query).Scan(&maxCnonections)` require.NoError(err)
Done.
pkg/sql/pgwire/conn_test.go, line 1881 at r10 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
is the idea that each func() is a separate test case? it's definitely a good pattern -- to make it even better for the go test runner output, you can use
t.Run("test name", func(t *testing.T) { // ... })
Yeah the tests originally had a bunch of defer calls that I removed and these functions made it easier to read. Naming the cases is even better though - good to know about that.
pkg/sql/pgwire/pgwire_test.go, line 267 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: for an integration test like this we usually use
db.Exec("SET CLUSTER SETTING ...")
to make sure it all works as from the end-user perspective
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great! nice work
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
bors r=rafiss |
Build failed (retrying...): |
Build succeeded: |
This setting specifies a maximum number of connections that a server can have open at any given time.
<0 - Connections are unlimited (existing behavior)
=0 - Connections are disabled
Release note (ops change): An off-by-default server.max_connections_per_gateway cluster
setting has been added to limit the maximum number of connections to a server.