-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: accept non-TLS client conns safely in secure mode #53991
Conversation
@bdarnell @petermattis I'd like to petition for this PR to be included in the 20.2 release, even though it is coming late in the cycle. This had been tracked before, but we have only learned in the past few days how important this feature is to our customers and why the security impact is minimal (as described in the commit message). Moreover, the code change is minimal (1 conditional) and the feature is opt-in. |
b8a0724
to
c6bfd16
Compare
cc @thtruo for tracking |
be96c0f
to
beda47c
Compare
beda47c
to
6b08822
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes LGTM. Thanks for the detailed write up in your commit message, it was instructive!
Reviewed 1 of 1 files at r1, 12 of 12 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @knz)
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 18 at r2 (raw file):
---- # Since root and testuser do not have a password, they stil
typo: still.
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 49 at r2 (raw file):
# Active authentication configuration on this node: # Original configuration: # host all root all cert-password # CockroachDB mandatory rule
I think the column alignment here is a bit off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To introduce this feature into an existing cluster, proceed as
follows:
Using this feature on a new cluster is awkward (assuming you want to use the hostnossl
directive to restrict non-TLS access to a certain network). You have to start the cluster without the command-line flag, connect with TLS to set the new hba.conf
, then restart the cluster with the flag.
Instead, if the default hba.conf
were more restrictive, say by defaulting to localhost-only, the process would be to start the cluster with the flag, then connect from localhost without TLS to set a less-restrictive hba.conf
. (but if you're unable to easily connect from localhost, which may be an issue in some orchestrated environments, you may be back to the more complicated procedure)
On the other hand, are these hba.conf
settings even necessary? A user who is comfortable with this flag is presumably practicing some sort of perimeter-based security, in which case I would expect that only the trusted part of the network is able to route packets to the database (and even if untrusted machines were able to talk to the database, they'd still need a valid password to get in). Unless we have specific user requests for this capability, I might leave the hba.conf
out.
Reviewed 11 of 12 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl and @knz)
pkg/cli/flags.go, line 350 at r2 (raw file):
varFlag(f, addrSetter{&serverHTTPAddr, &serverHTTPPort}, cliflags.ListenHTTPAddr) stringFlag(f, &serverSocketDir, cliflags.SocketDir) boolFlag(f, &startCtx.unencryptedLocalhostHTTP, cliflags.UnencryptedLocalhostHTTP)
It's unfortunate that --unencrypted-localhost-http
combines non-TLS access with restricting the port to the loopback interface. It would be nice to split that up some day (but not today - I think there are a number of cleanups we could make to the network configuration flags)
pkg/cli/cliflags/flags.go, line 540 at r2 (raw file):
} AllowSecureSQLWithoutTLS = FlagInfo{
It always smells bad to me when the FlagInfo variable name and the flag itself don't match. Consider removing the word Secure
here.
pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):
AllowSecureSQLWithoutTLS = FlagInfo{ Name: "accept-sql-without-tls",
In the counterpart PR #53994, the flag name makes it clear that passwords will be sent unencrypted. Should we do more to make that clear with the server-side flag?
pkg/sql/pgwire/auth_test.go, line 56 at r2 (raw file):
// the test file is applicable to both.) // // allow_non_tls
nit: Consider s/non/without/
for consistency with the other variables.
pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):
) } case hba.ConnHostSSL, hba.ConnHostNoSSL:
Maybe I'm just unfamiliar with this code, but I don't see where the hostssl
and hostnossl
directives are actually used.
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 35 at r2 (raw file):
ok # Now testuser can log in.
Add a test to verify that testuser can't log in with an invalid password.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ben I was also waiting for you to chime in before moving forward with this.
Using this feature on a new cluster is awkward (assuming you want to use the hostnossl directive to restrict non-TLS access to a certain network). You have to start the cluster without the command-line flag, connect with TLS to set the new hba.conf, then restart the cluster with the flag.
Good point. You're right but I believe, based on the next paragraph, that you were accidentally right. I'll clarify in the release note. This would be improved if we ever solve #26722 .
Instead, if the default hba.conf were more restrictive, say by defaulting to localhost-only, the process would be to start the cluster with the flag, then connect from localhost without TLS to set a less-restrictive hba.conf.
No that's not true - even if the SQL client can establish the connection, root
still cannot log in: without a valid TLS client cert, root would need a password, and there is no password by default. We'd need a trust
rule in the HBA config to log in without a password, but I'm feeling icky about that.
Which means that above you probably didn't meanm "you need to start the cluster without the flag to connect". In fact, you can connect with the flag just fine using the unix datagram socket and the default HBA config. However, you cannot log in as long as the password is not set.
This is why #26722 is so important: not just to customize the HBA conf, but also to set a root
password.
A user who is comfortable with this flag is presumably practicing some sort of perimeter-based security, in which case I would expect that only the trusted part of the network is able to route packets to the database (and even if untrusted machines were able to talk to the database, they'd still need a valid password to get in).
This was motivated by the general principle of defense in depth.
Unless we have specific user requests for this capability, I might leave the hba.conf out.
I can leave it out of the release note, but I think the code is more natural to read this way (and closer to pg).
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @knz)
pkg/cli/flags.go, line 350 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
It's unfortunate that
--unencrypted-localhost-http
combines non-TLS access with restricting the port to the loopback interface. It would be nice to split that up some day (but not today - I think there are a number of cleanups we could make to the network configuration flags)
Agreed.
pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
In the counterpart PR #53994, the flag name makes it clear that passwords will be sent unencrypted. Should we do more to make that clear with the server-side flag?
What do you have in mind?
pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Maybe I'm just unfamiliar with this code, but I don't see where the
hostssl
andhostnossl
directives are actually used.
The code to check them was already implemented in 20.1 and used internally in unit tests, but the condition was never encountered in a running server because of the conditional in server.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means that above you probably didn't meanm "you need to start the cluster without the flag to connect".
I didn't mean you have to start the cluster without the flag in order to connect; you have to start the cluster without the flag to avoid temporarily exposing non-TLS SQL sessions on unintended networks. It's true that it's accidentally safe because there are no users with passwords by default, but that's something we need to fix anyway (it's a prerequisite for #51991 cert-free setup). And when we set up a default user with a password, it becomes necessary to use this two-restart dance for new clusters if we accept the requirement that users need to be able to restrict non-TLS connections by IP range.
So that's why I'd prefer to reject this requirement unless we have a more explicit indication that it's important (especially if this is to be a last-minute addition to a release, which I'm not thrilled with. New flags are surface area we have to support for a long time, and if we're doing a wholesale change to the security setup model with #51991 in the next release I'm not sure it's a good idea to add a couple more knobs to the old scheme without time to really validate the interface in this release)
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @knz)
pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):
Previously, knz (kena) wrote…
What do you have in mind?
I'm not sure; everything I can think of seems very cumbersome. But again coming back to the scram discussion from #53994, it seems likely that in the future we'd want to allow non-TLS connections only if we're able to use scram for authentication. Maybe just make the flag --allow-unencrypted-passwords
for both client and server?
pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):
Previously, knz (kena) wrote…
The code to check them was already implemented in 20.1 and used internally in unit tests, but the condition was never encountered in a running server because of the conditional in
server.go
.
A cluster version check for this seems excessive to me, but OK.
... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None
This change makes it possible for a DBA / system administrator to reconfigure individual nodes *in a secure cluster* to accept SQL client sessions over TCP without mandating a TLS handshake. Authentication remains mandatory as per the HBA rules. Motivation: we have at least two high-profile customers who keep their nodes and client apps in a private secure network (with network-level encryption / privacy) and who experience client-side TLS as unnecessary and expensive friction. Why this does not impair security: - authentication remains mandatory (as per the HBA rules). - the feature is opt-in: the operator must set a command-line flag (`--accept-sql-without-tls`), which is not enabled by default. - there is an interlock: the user must both set up the flag and set log-in passwords for their SQL users (by default, users get created without a password and thus cannot log in without client certs). - for now, node-node connections still require TLS. For context, the default HBA configuration is the following: ``` host all root all cert-password # fixed rule host all all all cert-password # built-in CockroachDB default local all all password # built-in CockroachDB default ``` The directive `host` covers both TLS and non-TLS incoming TCP connections (`local` is for the unix socket). The method `cert-password` means "client cert or password": without a cert, the password is mandatory. As previously, the user can further secure the configuration by restricting connections to just a subnetwork, for example: ``` host all all 10.0.0.0/8 cert-password # accept conns on the 10/8 network. host all all all reject # reject conns from other networks local all all password ``` Note that this change is limited to the server side: CockroachDB's own `cockroach` CLI commands do not yet know how to connect to a CockroachDB server without TLS; such connections are only supported from `psql` or SQL client drivers in apps. (PostgreSQL's HBA rule types `hostssl` and `hostnossl` are now also recognized. They operate like in PostgreSQL. However we don't have a compelling use case for them yet so we don't emphasize them.) Release justification: fixes for high-priority or high-severity bugs in existing functionality Release note (security update): A new experimental flag `--accept-sql-without-tls` has been introduced for `cockroach start` and `start-single-node`: when specified, a secure node will also accept secure SQL connections without TLS. When this flag is enabled: - Node-to-node connections still use TLS: the server must still be started with `--certs-dir` and valid TLS cert configuration for nodes. - Client authentication (spoof protection) and authorization (access control and privilege escalation prevention) is performed by CockroachDB as usual, subject to the HBA configuration (for authn) and SQL privileges (for authz). - Transport-level security (integrity and confidentiality) for client connections must then be provided by the operator outside of CockroachDB -- for example, by using a private network or VPN dedicated to CockroachDB and its client app(s). - The flag only applies to the SQL interface. TLS is still required for the HTTP endpoint (unless `--unencrypted-localhost-http` is passed) and for the RPC endpoint. To introduce this feature into an existing cluster, proceed as follows: 1. ensure the cluster ugprade is finalized. 2. set up the HBA configuration to reject `host` connections for any network other than the one that has been secured. 3. add the command-line flag and restart the nodes. Note that even when the flag is supplied, clients can still negotiate TLS and present a valid TLS certificate to identify themselves (at least under the default HBA configuration). Finally, this flag is experimental and its ergonomics will likely change in a later version.
6b08822
to
334c256
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to start the cluster without the flag to avoid temporarily exposing non-TLS SQL sessions on unintended networks
Not exactly. The user can still limit somewhat with --listen-addr=
to localhost or something like that.
Also we should really talk about #26722 which would solve both this and facilitate #51991 like you point out.
So that's why I'd prefer to reject this requirement ... not sure it's a good idea to add a couple more knobs
For context we're focusing on the hostnossl
and hostssl
rule match right? These are pg-compatible directives and the logic for them was already implemented in v20.1. What is new here is that the code path is activated. The user doesn't even have to know about them since host
matches both TLS and non-TLS and they could use that already.
So I'll take your hint we dont want to talk about them and removed them from the release note.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @bdarnell, and @irfansharif)
pkg/cli/cliflags/flags.go, line 540 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
It always smells bad to me when the FlagInfo variable name and the flag itself don't match. Consider removing the word
Secure
here.
Done.
pkg/cli/cliflags/flags.go, line 541 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I'm not sure; everything I can think of seems very cumbersome. But again coming back to the scram discussion from #53994, it seems likely that in the future we'd want to allow non-TLS connections only if we're able to use scram for authentication. Maybe just make the flag
--allow-unencrypted-passwords
for both client and server?
These are two fully unrelated concerns.
SCRAM would work equally well with or without TLS. SCRAM doesn't need a flag.
The flag here is introduced to ensure that a cluster does not get mistaknely started without TLS, unless the user opts in explicitly.
pkg/sql/pgwire/auth_test.go, line 56 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
nit: Consider
s/non/without/
for consistency with the other variables.
Done.
pkg/sql/pgwire/hba_conf.go, line 153 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
A cluster version check for this seems excessive to me, but OK.
We discovered last time it is necessary: the other not-yet-upgraded nodes in the cluster would choke if they see a new keyword they don't understand in the HBA config.
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 18 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
typo: still.
Done.
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 35 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Add a test to verify that testuser can't log in with an invalid password.
Done.
pkg/sql/pgwire/testdata/auth/secure_non_tls, line 49 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I think the column alignment here is a bit off.
this is auto-generated. If there's an improvement to be made it doesn't belong to this PR.
bors r=aaron-crl,irfansharif,bdarnell |
Build succeeded: |
53842: server: always create a liveness record before starting up r=irfansharif a=irfansharif Previously it used to be the case that it was possible for a node to be up and running, and for there to be no corresponding liveness record for it. This was a very transient situation as liveness records are created for a given node as soon as it out its first heartbeat. Still, given that this could take a few seconds, it lent to a lot of complexity in our handling of node liveness where we had to always anticipate the possibility of there being no corresponding liveness record for a given node (and thus creating it if necessary). Having a liveness record for each node always present is a crucial building block for long running migrations (#48843). There the intention is to have the orchestrator process look towards the list of liveness records for an authoritative view of cluster membership. Previously when it was possible for an active member of the cluster to not have a corresponding liveness record (no matter how unlikely or short-lived in practice), we could not generate such a view. --- This is an alternative implementation for #53805. Here we choose to manually write the liveness record for the bootstrapping node when writing initial cluster data. For all other nodes, we do it on the server-side of the join RPC. We're also careful to do it in the legacy codepath when joining a cluster through gossip. Release note: None 53994: cli: allow SQL commands to use password authn in more cases r=bdarnell,irfansharif,aaron-crl a=knz First two commits from #53991. Previously, SQL password authn was only allowed over TLS connections. With this change, password authn is allowed regardless of whether the connection uses TLS. This is implemented by also only asking for a password interactively the first time that the server complains that pw auth has failed. This way, no password is ever requested interactively if the server "trusts" the connection (via HBA rules or `--insecure`). Release justification: low risk, high benefit changes to existing functionality 54035: sql: emit more tracing events from the stats cache r=RaduBerinde a=RaduBerinde The stats cache has various "slow" paths (where we need to query the system table). These are currently only logged if verbosity is high. This change switches to `VEvent` in most cases, so that these are visible during tracing (including in statement diagnostics bundles). This will allow us to diagnose slow planning times, e.g. due to the stats cache getting full. Release justification: low-risk change to existing functionality, high potential benefit for debugging issues. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Fixes #44842.
Informs cockroachdb/helm-charts#228.
Informs #53404.
This change makes it possible for a DBA / system administrator to
reconfigure individual nodes in a secure cluster to accept SQL
client sessions over TCP without mandating a TLS
handshake. Authentication remains mandatory as per the HBA rules.
Motivation: we have at least two high-profile customers who keep their
nodes and client apps in a private secure network (with network-level
encryption / privacy) and who experience client-side TLS as
unnecessary and expensive friction.
Additionally, this feature is a prerequisite to upgrade an insecure
cluster to secure mode without downtime.
Why this does not impair security:
(
--accept-sql-without-tls
), which is not enabled by default.and set log-in passwords for their SQL users (by default,
users get created without a password and thus cannot log
in without client certs).
For context, the default HBA configuration is the following:
The directive
host
covers both TLS and non-TLS incoming TCPconnections (
local
is for the unix socket). The methodcert-password
means "client cert or password": without a cert, thepassword is mandatory.
As previously, the user can further secure the configuration by
restricting non-TLS connections to just a subnetwork, for example:
Note that this change is limited to the server side: CockroachDB's own
cockroach
CLI commands do not yet know how to connect to aCockroachDB server without TLS; such connections are only supported
from
psql
or SQL client drivers in apps. See #53994 for a follow-up.Release justification: fixes for high-priority or high-severity bugs in existing functionality