cli: the `--insecure` flag will be removed and replaced by easier-to-use always-secure options #53404

knz · 2020-08-25T08:15:58Z

Epic: CRDB-12037

The --insecure flag disables a number of security controls which are completely expected even for developers or folk trying out CockroachDB.

Instead, we aim to ensure that all clusters are secure, but offer new secure options where the user can choose the combination of security features that best matches their needs and their environment.

(Additionally, the word "insecure" gives the wrong impression that CockroachDB is not a secure database. The "insecure mode" only really exists for internal testing by the CockroachDB team and should not have been exposed to users from the start.)

What you need to know in a post-insecure world
- New clusters
- Existing clusters
Rationale

What you need to know in a post-insecure world

CockroachDB aims to remain easy to use when all clusters are secured!

New clusters

Simplified decision making:

for developers, single machine, no security requirements:
- cockroach demo, automatically secure
- single-node secure cluster, with auto-managed TLS
multiple machines or prod system:
- for node-node traffic:
  - no special requirement on inter-node traffic: auto-managed node-node TLS
  - special requirements:
    - org-mandated PKI: use org-mandated PKI to manage node-node TLS
    - user responsible for network security: no-TLS RPC
- for node-client traffic:
  - no special requirements: auto-managed SQL/HTTP TLS with either password or TLS authn
    This is the most common pg-compatible scenario!
  - special requirements:
    - org-mandated PKI: use org-mandated cert manager to manage SQL/HTTP TLS
    - user responsible for network security: no-TLS SQL/HTTP

Cluster is secure in all cases.

Threat model

Node-node connections

Threat \ Protection	TLS transport+authn	no-TLS + Shared secret	no-TLS + no auth (`--insecure`)	Mitigation possible via network restrictions?
Mistaken node join from wrong cluster	protected	protected	vulnerable	no
Complete cluster takeover via malicious node	protected	vulnerable via bruteforce	vulnerable	partial
Attacker MITM snoop of authn credentials	protected	vulnerable	vulnerable	yes
Attacker accesses confidential data via network snoop	protected	vulnerable	vulnerable	yes
Attacker modifies cluster data in-flight	protected	vulnerable	vulnerable	yes
Compromised authn credentials	Fix via cert rotation/revocation (online)	Update secret + full restart	N/A	partial

Note: the vulnerabilities in the 2nd and 3rd column are amplified by sharing the same TCP listener (address/port) between SQL and RPC listeners.

Partial mitigation possible via separate --sql-addr and fencing off the node-to-node traffic into a private network.

Node-client connections

Threat \ Protection	TLS transport+authn	no-TLS + SCRAM authn	no-TLS + no auth (`--insecure`)	Mitigation possible via network restrictions?
Mistaken connect from/to wrong app or wrong cluster	protected	protected	vulnerable	no
Complete cluster takeover via escalation of privilege	protected	protected	vulnerable	partial
Rogue client app unrestricted access to data of 1 SQL user	protected	protected	vulnerable	partial
Attacker MITM snoop of authn credentials	protected	protected	vulnerable	yes
Attacker accesses app data data via network snoop	protected	vulnerable	vulnerable	yes
Attacker modifies app data in-flight	protected	vulnerable	vulnerable	yes
Compromised authn credentials	Fix via cert rotation/revocation (online)	Update password (online)	N/A	no

Note that the first column in the table above assumes that clients also validate server
TLS certs! Otherwise the following table applies:

Threat \ Protection	Two-way cert validation	Only srv auths client
Mistaken connect from/to wrong app or wrong cluster	protected	protected
Complete cluster takeover via escalation of privilege	protected	protected
Rogue client app unrestricted access to data of 1 SQL user	protected	protected
Attacker MITM snoop of authn credentials	protected	vulnerable
Attacker accesses app data data via network snoop	protected	vulnerable
Attacker modifies app data in-flight	protected	vulnerable
Compromised authn credentials	Fix via cert rotation/revocation (online)	still vulnerable (!)

Existing clusters previously run with "insecure mode"

determine the secure configuration that suits your needs (see above)
prepare configuration:
- create authn credentials for SQL users (can be simple password)
- prepare server flags and TLS certs that suit the needs identified above
- add flag --security-upgrade to node configs.
perform rolling restart of cluster. At this point the cluster accepts but does not mandate the secure config.
remove flag --security-upgrade from node configs
perform 2nd rolling restart. This enforces the secure configuration.

Advanced decision flowchart under the fold

Rationale

Background

For context, --insecure does the following:

deactivates TLS handshakes for node-node connections
deactivates TLS handshakes for node-client RPC connections
deactivates TLS handshakes for node-client SQL connections over TCP
deactivates TLS handshakes for node-client HTTP connections
deactivates node-to-node authentication
deactivates HTTP authentication
deactivates SQL authentication
deactivates SQL authorization
deactivate certain SQL features, so as to not create the illusion of security

Motivation

We know from experience that customers who advocate for "insecure mode" really only care about:

the complexity of setting up TLS certificates for nodo-to-node authentication, or
they want TLS-less SQL connections because they are running a server over a privileged network, for example when the nodes and the SQL app are inside the same k8s cluster (see server,ui: infinite HTTP redirect when accessing UI for secure cluster behind nginx-ingress in k8s helm-charts#228, cli,server: enable SQL clients on TCP with non-TLS modes #44842).

These users certainly do not want to disable authentication and authorization, yet we also know that users do not realize that --insecure disables these internal protections inside cockroachdb.

Some users also care about point 4 because setting up TLS certs in a web browser is a pain. However since v20.1 we have a solution for those users: --unencrypted-localhost-http makes the HTTP endpoint localhost-only without TLS.

Strategy

in v20.2 make --insecure deprecated
Solve cli,server: enable SQL clients on TCP with non-TLS modes #44842 so that users get a choice to connect with SQL over non-TLS connections - pgwire: accept non-TLS client conns safely in secure mode #53991
Solve server: handle some HTTP APIs without TLS #48069 to polish the HTTP probe experience
Advance RFC: auto-generation of TLS certificates for new clusters and newly added nodes #51991 to simplify the TLS usage in the common case
Possibly solve server: enable secure clusters with authenticated gRPC, but without TLS #54007 to offer an online upgrade path from insecure clusters to secure
advertise these mechanisms in docs
update roachtests to always use a secure cluster, sometimes without TLS to preserve the ability to capture network activity between clients and nodes
remove support for the flag in servers in v21.1 or v21.2
remove support for the flag in clients in the release after that (to preserve the ability for a new client to connect to a server running the version before)

Epic CRDB-12037

Jira issue: CRDB-3870

The text was updated successfully, but these errors were encountered:

knz · 2020-08-25T08:27:25Z

Related: #49918

knz · 2020-08-25T08:27:33Z

cc @thtruo @aaron-crl for tracking

53405: cli flags clean-up r=tbg a=knz Informs #53404 Informs #49918 Release note (cli change): The command-line flag `--socket` has been removed. It was deprecated since v20.1. Use `--socket-dir` in replacement. Release note (cli change): The command-line `--insecure` has been marked as deprecated. See #53404 for details. The flag will be removed in a later version in a staged fashion: first, additional security mechanisms will be added to enable more flexible deployments which were previously done using `--insecure`; then the flaf will be removed from server commands, then finally in a later version also from client commands. Co-authored-by: Raphael 'kena' Poss <[email protected]>

jasobrown · 2020-08-27T11:56:05Z

Is the intent here to move all connections to the crdb process to be secure? And by secure, i mean secure in the way that cockroachdb expects? We manage TLS as well as cert authN/AuthZ outside of crdb as a separate process on the database instances, and thus run crdb in insecure mode because it's fronted by the other process. The reason for this is our authN/authZ workflow is slightly non-standard and the effort to integrate that into crdb (as far we knew a year ago) seemed daunting.

Perhaps I can take another look at our golang support for our authN/authZ as well as the cockroach code to see if the integration could work via some extension points/plugins.

knz · 2020-08-27T12:12:20Z

Hi Jason, thank you for the feedback.

Once the --insecure flag is gone, you will be able to achieve the same as your current set up with the following configuration

use a non-TLS listener for the SQL clients (cli,server: enable SQL clients on TCP with non-TLS modes #44842)
use a HBA configuration that tells CockroachDB to trust the login principal without additional authentication (since you authenticate externally). You would then narrow down the trust rule to trust only the address of your external authentication service (and not arbitrary connections).

Meanwhile, we'll encourage you to keep node-to-node connections secure using TLS; if you prefer not to set up TLS certs manually, we'll have some new experience (see #51991) which will simplify the setup.

Does that sound good to you?

knz · 2020-08-27T12:16:50Z

@jasobrown a question for you though: does your set up use a single SQL account for all operations, or do you use separate SQL accounts with different privileges? Can you outline a little bit how your authorization looks like?

edit: discussed this with jason offline

... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None

53991: pgwire: accept non-TLS client conns safely in secure mode r=aaron-crl,irfansharif,bdarnell a=knz Fixes #44842. Informs #49532. Informs #53404. This change makes it possible for a DBA / system administrator to reconfigure individual nodes *in a secure cluster* to accept SQL client sessions over TCP without mandating a TLS handshake. Authentication remains mandatory as per the HBA rules. Motivation: we have at least two high-profile customers who keep their nodes and client apps in a private secure network (with network-level encryption / privacy) and who experience client-side TLS as unnecessary and expensive friction. Additionally, **this feature is a prerequisite to upgrade an insecure cluster to secure mode without downtime.** Why this does not impair security: - authentication remains mandatory (as per the HBA rules -- [1] [2]). - the feature is opt-in: the operator must set a command-line flag (`--accept-sql-without-tls`), which is not enabled by default. - there is an interlock: the user must both set up the flag and set log-in passwords for their SQL users (by default, users get created without a password and thus cannot log in without client certs). - for now, node-node connections still require TLS. [1]: https://www.postgresql.org/docs/12/auth-pg-hba-conf.html [2]: https://dr-knz.net/authentication-in-postgresql-and-cockroachdb.html For context, the default HBA configuration is the following: ``` host all root all cert-password # fixed rule host all all all cert-password # built-in CockroachDB default local all all password # built-in CockroachDB default ``` The directive `host` covers both TLS and non-TLS incoming TCP connections (`local` is for the unix socket). The method `cert-password` means "client cert or password": without a cert, the password is mandatory. As previously, the user can further secure the configuration by restricting non-TLS connections to just a subnetwork, for example: ``` host all all 10.0.0.0/8 password # accept conns on the 10/8 network. host all all all reject # refuse conns from other nets. local all all password ``` Note that this change is limited to the server side: CockroachDB's own `cockroach` CLI commands do not yet know how to connect to a CockroachDB server without TLS; such connections are only supported from `psql` or SQL client drivers in apps. See #53994 for a follow-up. Release justification: fixes for high-priority or high-severity bugs in existing functionality 54019: roachtest: de-flake 'inconsistent' r=knz a=tbg This test sets up an intentionally corrupted replica and wants its node to shut down as a result of its detection. When only two of the three nodes were included in the consistency check, either one of them could end up terminating (as no obvious majority of healthy replicas can be determined). Change the test so that we wait for the cluster to come fully together before setting a low consistency check interval. Closes #54005. Release justification: testing Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>

... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None

bdarnell · 2020-09-10T20:06:45Z

The word "insecure" gives the wrong impression that CockroachDB is not a secure database.

Citation needed. Are people really getting confused by this? I think the real issue with --insecure mode is that because of the lack of grpc authentication, it's more insecure than people expect.

Cluster is secure in all cases.

Security is not binary. The question is always "secure against what threats?" Some of these cases are secure against threats that others are not, so it's misleading to say that they're all equally secure (of course they may be used in systems that are secure against those threats if security is provided at other layers)

And that's really what we're talking about here - traditionally we've just had all-or-nothing, with the no-protection-at-all --insecure mode or complex manually-managed certificates. Here we're adding multiple in-between modes. And then, because some of these intermediate modes will be strictly more secure than --insecure but no more difficult to use, we'll eventually eliminate --insecure mode. I think leading with the removal of --insecure is causing some surprise among users (although more than I expected - I thought we had discouraged this mode enough that no one would be using it in production).

That flowchart scares me - there's a lot of decision points there. A lot of them are new. The goal of #51991 and eliminating insecure mode is to reduce the number of options that most users experience, by steering nearly all users to the "internally-managed TLS" path. I think we need to reconsider the amount of flexibility we're providing here, and what decisions the user really needs to make.

For example, I don't think we ever want to give the "non-TLS SQL" option this much emphasis. CockroachCloud will always need to use TLS, so we must have a good user experience for TLS (perhaps by working with driver implementations on things like #32932).

Existing clusters previously run with "insecure mode"

Are there enough of these that we need to provide a no-downtime upgrade path? This seems like it could add quite a bit of complexity (can you upgrade from any security mode to any other, or only certain ones?)

We know from experience that customers who advocate for "insecure mode" really only care about point 3: they want TLS-less SQL connections

This is definitely not true. There have been some recent examples of this but I think the most common reason people resist setting their clusters up securely is about all the TLS setup, especially for node-to-node.

Possibly solve #54007 and/or advance #51991 to enable secure RPC without manual TLS configs

This seems like a requirement, not just a "possibly".

knz · 2020-09-11T11:21:00Z

The word "insecure" gives the wrong impression that CockroachDB is not a secure database.

Citation needed.

This came up internally multiple times.

Are people really getting confused by this? I think the real issue with --insecure mode is that because of the lack of grpc authentication, it's more insecure than people expect.

This is pointed out prominently at the top of the issue description. But I'll take your point, as well as this one:

because some of these intermediate modes will be strictly more secure than --insecure but no more difficult to use, we'll eventually eliminate --insecure mode. I think leading with the removal of --insecure is causing some surprise among users.

I have adjusted the title of the issue to emphasize the new things / improvements and pulled that notion as first sentences/paragraphs in the issue description.

Cluster is secure in all cases.

Security is not binary. The question is always "secure against what threats?"

You and I know that security is not binary but the point here is that all the proposed options keep all internal security controls active, which --insecure=true does not.

Anyway you are right that we need to talk about threats. Adding a table in the issue description instead of the flowchart. Bram helped me understand that a list of threats is more easy to use than a flowchart anyway.

Some of these cases are secure against threats that others are not, so it's misleading to say that they're all equally secure

"say they're all equally secure" - that's a strawman. Nobody wrote this anywhere.

That flowchart scares me - there's a lot of decision points there. [...] I think we need to reconsider the amount of flexibility we're providing here, and what decisions the user really needs to make.

Point granted. I've reduced the scope accordingly.

For example, I don't think we ever want to give the "non-TLS SQL" option this much emphasis. CockroachCloud will always need to use TLS, so we must have a good user experience for TLS (perhaps by working with driver implementations on things like #32932).

It seems clear that we need to emphasize the CC use case first and foremost, and thus preserve TLS options as the main recommendation (and main recommended scenario). However we do have serious $$$ at risk if we don't offer other options.

Existing clusters previously run with "insecure mode"

Are there enough of these that we need to provide a no-downtime upgrade path? This seems like it could add quite a bit of complexity (can you upgrade from any security mode to any other, or only certain ones?)

There are at least a few customers asking (those big accounts who didn't like our TLS and went to prod with --insecure, despite our recommendations to the countrary). But we can narrow down the scope of the upgrade path to support just those customers, yes.

We know from experience that customers who advocate for "insecure mode" really only care about point 3: they want TLS-less SQL connections

This is definitely not true. There have been some recent examples of this but I think the most common reason people resist setting their clusters up securely is about all the TLS setup, especially for node-to-node.

Point taken. Adjusted the text accordingly.

Possibly solve #54007 and/or advance #51991 to enable secure RPC without manual TLS configs

This seems like a requirement, not just a "possibly".

The word "possibly" only pertained to #54007 (which you and I would agree is probably less important). Issue #51991 is definitely critical.

knz · 2020-09-11T11:50:39Z

I also just realized that the TLS story could also be simplified by issuing a common TLS client cert for all SQL users (and give them separate passwords to distinguish them)

aaron-crl · 2020-10-09T18:24:05Z

Apologies for the late response but there are three things that strike me here:

--insecure should probably do a better job of explicitly stating what is insecure mode. Perhaps we should include your list below in the output from the flag (formatted to CLI of course):

--insecure does the following:

deactivates TLS handshakes for node-node connections
deactivates TLS handshakes for node-client RPC connections
deactivates TLS handshakes for node-client SQL connections over TCP
deactivates TLS handshakes for node-client HTTP connections
deactivates node-to-node authentication
deactivates HTTP authentication
deactivates SQL authentication
deactivates SQL authorization
deactivate certain SQL features, so as to not create the illusion of security

There's a lot here to unpack and given the potential impact to developers and customers perhaps this should be moved to an RFC given the current prevalence of this flag and depth of discussion.
I don't feel we should mark a feature as deprecated until we have a supported alternative (or at least support for the majority of the usage). I don't feel we have that yet given that our own documentation still includes this flag for tutorials and guides.

knz · 2020-10-13T09:27:23Z

Regarding point 1: there is now a clearer warning than before. The warning reads as follows:

*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
*
* This mode is intended for non-production testing only.
*
* In this mode:
* - Your cluster is open to any client that can access any of your IP addresses.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
*
* - https://go.crdb.dev/issue-v/53404/v20.2
* - https://www.cockroachlabs.com/docs/v20.2/secure-a-cluster.html
*

Regarding point 2: yes this work will be in a RFC of course.

Regarding point 3: agreed

58662: cli/demo: avoid login URL if insecure r=otan a=knz Fixes #58661 NB: insecure is deprecated anyway #53404 Co-authored-by: Raphael 'kena' Poss <[email protected]>

knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-security labels Aug 25, 2020

knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Aug 25, 2020

This was referenced Aug 25, 2020

cli flags clean-up #53405

Merged

K8s Operator: Support the ability to change an insecure to a secure cluster cockroachdb/cockroach-operator#139

Closed

johnrk-zz mentioned this issue Aug 27, 2020

Make Secure Mode the Default cockroachdb/cockroach-operator#153

Closed

knz added a commit to knz/cockroach that referenced this issue Sep 5, 2020

base: add an explanatory comment about the insecure flag

e3e7f16

... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None

This was referenced Sep 7, 2020

pgwire: accept non-TLS client conns safely in secure mode #53991

Merged

server: enable secure clusters with authenticated gRPC, but without TLS #54007

Open

knz added a commit to knz/cockroach that referenced this issue Sep 10, 2020

base: add an explanatory comment about the insecure flag

c80e92c

... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None

knz added a commit to knz/cockroach that referenced this issue Sep 10, 2020

base: add an explanatory comment about the insecure flag

b7f544f

... to refer to issue cockroachdb#53404. Release justification: non-production code changes Release note: None

knz changed the title ~~cli: the --insecure flag is undesirable and will be removed from all CLI commands~~ cli: the --insecure flag will be removed and replaced by easier-to-use always-secure options Sep 11, 2020

kmcrawford mentioned this issue Nov 13, 2020

Allow new --security-upgrade flag and new security options in 20.2.0 cockroachdb/helm-charts#40

Open

knz mentioned this issue Jan 8, 2021

cli/demo: avoid login URL if insecure #58662

Merged

craig bot pushed a commit that referenced this issue Jan 11, 2021

Merge #58662

23c9843

58662: cli/demo: avoid login URL if insecure r=otan a=knz Fixes #58661 NB: insecure is deprecated anyway #53404 Co-authored-by: Raphael 'kena' Poss <[email protected]>

rafiss mentioned this issue Mar 23, 2021

roachprod: Clusters should default to secure #38539

Closed

jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021

knz added the A-server-architecture Relates to the internal APIs and src org for server code label Jul 29, 2021

This was referenced Dec 30, 2021

*: remove calls to grpc.WithInsecure #74256

Open

server: turn --insecure flag into more fine-grained controls #74329

Open

dghubble mentioned this issue Nov 27, 2022

Local development and setting stable credentials via cockroach demo #92541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: the `--insecure` flag will be removed and replaced by easier-to-use always-secure options #53404

cli: the `--insecure` flag will be removed and replaced by easier-to-use always-secure options #53404

knz commented Aug 25, 2020 •

edited by cockroach-jira-scripts

Loading

knz commented Aug 25, 2020

knz commented Aug 25, 2020 •

edited

Loading

jasobrown commented Aug 27, 2020

knz commented Aug 27, 2020 •

edited

Loading

knz commented Aug 27, 2020 •

edited

Loading

bdarnell commented Sep 10, 2020

knz commented Sep 11, 2020 •

edited

Loading

knz commented Sep 11, 2020

aaron-crl commented Oct 9, 2020

knz commented Oct 13, 2020

cli: the --insecure flag will be removed and replaced by easier-to-use always-secure options #53404

cli: the --insecure flag will be removed and replaced by easier-to-use always-secure options #53404

Comments

knz commented Aug 25, 2020 • edited by cockroach-jira-scripts Loading

What you need to know in a post-insecure world

New clusters

Threat model

Existing clusters previously run with "insecure mode"

Rationale

Background

Motivation

Strategy

knz commented Aug 25, 2020

knz commented Aug 25, 2020 • edited Loading

jasobrown commented Aug 27, 2020

knz commented Aug 27, 2020 • edited Loading

knz commented Aug 27, 2020 • edited Loading

bdarnell commented Sep 10, 2020

knz commented Sep 11, 2020 • edited Loading

knz commented Sep 11, 2020

aaron-crl commented Oct 9, 2020

knz commented Oct 13, 2020

cli: the `--insecure` flag will be removed and replaced by easier-to-use always-secure options #53404

cli: the `--insecure` flag will be removed and replaced by easier-to-use always-secure options #53404

knz commented Aug 25, 2020 •

edited by cockroach-jira-scripts

Loading

knz commented Aug 25, 2020 •

edited

Loading

knz commented Aug 27, 2020 •

edited

Loading

knz commented Aug 27, 2020 •

edited

Loading

knz commented Sep 11, 2020 •

edited

Loading