Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: the --insecure flag will be removed and replaced by easier-to-use always-secure options #53404

Open
3 of 9 tasks
knz opened this issue Aug 25, 2020 · 10 comments
Open
3 of 9 tasks
Labels
A-security A-server-architecture Relates to the internal APIs and src org for server code C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-server-and-security DB Server & Security X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@knz
Copy link
Contributor

knz commented Aug 25, 2020

Epic: CRDB-12037

The --insecure flag disables a number of security controls which are completely expected even for developers or folk trying out CockroachDB.

Instead, we aim to ensure that all clusters are secure, but offer new secure options where the user can choose the combination of security features that best matches their needs and their environment.

(Additionally, the word "insecure" gives the wrong impression that CockroachDB is not a secure database. The "insecure mode" only really exists for internal testing by the CockroachDB team and should not have been exposed to users from the start.)

What you need to know in a post-insecure world

CockroachDB aims to remain easy to use when all clusters are secured!

New clusters

Simplified decision making:

  • for developers, single machine, no security requirements:
    • cockroach demo, automatically secure
    • single-node secure cluster, with auto-managed TLS
  • multiple machines or prod system:
    • for node-node traffic:
      • no special requirement on inter-node traffic: auto-managed node-node TLS
      • special requirements:
        • org-mandated PKI: use org-mandated PKI to manage node-node TLS
        • user responsible for network security: no-TLS RPC
    • for node-client traffic:
      • no special requirements: auto-managed SQL/HTTP TLS with either password or TLS authn
        This is the most common pg-compatible scenario!
      • special requirements:
        • org-mandated PKI: use org-mandated cert manager to manage SQL/HTTP TLS
        • user responsible for network security: no-TLS SQL/HTTP

Cluster is secure in all cases.

Threat model

Node-node connections

Threat \ Protection TLS transport+authn no-TLS + Shared secret no-TLS + no auth (--insecure) Mitigation possible via network restrictions?
Mistaken node join from wrong cluster protected protected vulnerable no
Complete cluster takeover via malicious node protected vulnerable via bruteforce vulnerable partial
Attacker MITM snoop of authn credentials protected vulnerable vulnerable yes
Attacker accesses confidential data via network snoop protected vulnerable vulnerable yes
Attacker modifies cluster data in-flight protected vulnerable vulnerable yes
Compromised authn credentials Fix via cert rotation/revocation (online) Update secret + full restart N/A partial

Note: the vulnerabilities in the 2nd and 3rd column are amplified by sharing the same TCP listener (address/port) between SQL and RPC listeners.

Partial mitigation possible via separate --sql-addr and fencing off the node-to-node traffic into a private network.

Node-client connections

Threat \ Protection TLS transport+authn no-TLS + SCRAM authn no-TLS + no auth (--insecure) Mitigation possible via network restrictions?
Mistaken connect from/to wrong app or wrong cluster protected protected vulnerable no
Complete cluster takeover via escalation of privilege protected protected vulnerable partial
Rogue client app unrestricted access to data of 1 SQL user protected protected vulnerable partial
Attacker MITM snoop of authn credentials protected protected vulnerable yes
Attacker accesses app data data via network snoop protected vulnerable vulnerable yes
Attacker modifies app data in-flight protected vulnerable vulnerable yes
Compromised authn credentials Fix via cert rotation/revocation (online) Update password (online) N/A no

Note that the first column in the table above assumes that clients also validate server
TLS certs! Otherwise the following table applies:

Threat \ Protection Two-way cert validation Only srv auths client
Mistaken connect from/to wrong app or wrong cluster protected protected
Complete cluster takeover via escalation of privilege protected protected
Rogue client app unrestricted access to data of 1 SQL user protected protected
Attacker MITM snoop of authn credentials protected vulnerable
Attacker accesses app data data via network snoop protected vulnerable
Attacker modifies app data in-flight protected vulnerable
Compromised authn credentials Fix via cert rotation/revocation (online) still vulnerable (!)

Existing clusters previously run with "insecure mode"

  1. determine the secure configuration that suits your needs (see above)
  2. prepare configuration:
    • create authn credentials for SQL users (can be simple password)
    • prepare server flags and TLS certs that suit the needs identified above
    • add flag --security-upgrade to node configs.
  3. perform rolling restart of cluster. At this point the cluster accepts but does not mandate the secure config.
  4. remove flag --security-upgrade from node configs
  5. perform 2nd rolling restart. This enforces the secure configuration.

Advanced decision flowchart under the fold

Variants of secure clusters

Rationale

Background

For context, --insecure does the following:

  1. deactivates TLS handshakes for node-node connections
  2. deactivates TLS handshakes for node-client RPC connections
  3. deactivates TLS handshakes for node-client SQL connections over TCP
  4. deactivates TLS handshakes for node-client HTTP connections
  5. deactivates node-to-node authentication
  6. deactivates HTTP authentication
  7. deactivates SQL authentication
  8. deactivates SQL authorization
  9. deactivate certain SQL features, so as to not create the illusion of security

Motivation

We know from experience that customers who advocate for "insecure mode" really only care about:

These users certainly do not want to disable authentication and authorization, yet we also know that users do not realize that --insecure disables these internal protections inside cockroachdb.

Some users also care about point 4 because setting up TLS certs in a web browser is a pain. However since v20.1 we have a solution for those users: --unencrypted-localhost-http makes the HTTP endpoint localhost-only without TLS.

Strategy

Epic CRDB-12037

Jira issue: CRDB-3870

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-security labels Aug 25, 2020
@knz knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Aug 25, 2020
@knz
Copy link
Contributor Author

knz commented Aug 25, 2020

Related: #49918

@knz
Copy link
Contributor Author

knz commented Aug 25, 2020

cc @thtruo @aaron-crl for tracking

craig bot pushed a commit that referenced this issue Aug 25, 2020
53405: cli flags clean-up r=tbg a=knz

Informs #53404
Informs #49918

Release note (cli change): The command-line flag `--socket` has been
removed. It was deprecated since v20.1. Use `--socket-dir` in
replacement.

Release note (cli change): The command-line `--insecure` has been
marked as deprecated. See #53404 for details.
The flag will be removed in a later version in a staged fashion:
first, additional security mechanisms will be added to enable more
flexible deployments which were previously done using `--insecure`;
then the flaf will be removed from server commands, then finally in a
later version also from client commands.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@jasobrown
Copy link
Contributor

Is the intent here to move all connections to the crdb process to be secure? And by secure, i mean secure in the way that cockroachdb expects? We manage TLS as well as cert authN/AuthZ outside of crdb as a separate process on the database instances, and thus run crdb in insecure mode because it's fronted by the other process. The reason for this is our authN/authZ workflow is slightly non-standard and the effort to integrate that into crdb (as far we knew a year ago) seemed daunting.

Perhaps I can take another look at our golang support for our authN/authZ as well as the cockroach code to see if the integration could work via some extension points/plugins.

@knz
Copy link
Contributor Author

knz commented Aug 27, 2020

Hi Jason, thank you for the feedback.

Once the --insecure flag is gone, you will be able to achieve the same as your current set up with the following configuration

Meanwhile, we'll encourage you to keep node-to-node connections secure using TLS; if you prefer not to set up TLS certs manually, we'll have some new experience (see #51991) which will simplify the setup.

Does that sound good to you?

@knz
Copy link
Contributor Author

knz commented Aug 27, 2020

@jasobrown a question for you though: does your set up use a single SQL account for all operations, or do you use separate SQL accounts with different privileges? Can you outline a little bit how your authorization looks like?

edit: discussed this with jason offline

knz added a commit to knz/cockroach that referenced this issue Sep 5, 2020
... to refer to issue cockroachdb#53404.

Release justification: non-production code changes

Release note: None
knz added a commit to knz/cockroach that referenced this issue Sep 10, 2020
... to refer to issue cockroachdb#53404.

Release justification: non-production code changes

Release note: None
craig bot pushed a commit that referenced this issue Sep 10, 2020
53991: pgwire: accept non-TLS client conns safely in secure mode r=aaron-crl,irfansharif,bdarnell a=knz

Fixes #44842.
Informs #49532. 
Informs #53404.

This change makes it possible for a DBA / system administrator to
reconfigure individual nodes *in a secure cluster* to accept SQL
client sessions over TCP without mandating a TLS
handshake. Authentication remains mandatory as per the HBA rules.

Motivation: we have at least two high-profile customers who keep their
nodes and client apps in a private secure network (with network-level
encryption / privacy) and who experience client-side TLS as
unnecessary and expensive friction.

Additionally, **this feature is a prerequisite to upgrade an insecure
cluster to secure mode without downtime.**

Why this does not impair security:

- authentication remains mandatory (as per the HBA rules -- [1] [2]).
- the feature is opt-in: the operator must set a command-line flag
  (`--accept-sql-without-tls`), which is not enabled by default.
- there is an interlock: the user must both set up the flag
  and set log-in passwords for their SQL users (by default,
  users get created without a password and thus cannot log
  in without client certs).
- for now, node-node connections still require TLS.

[1]: https://www.postgresql.org/docs/12/auth-pg-hba-conf.html
[2]: https://dr-knz.net/authentication-in-postgresql-and-cockroachdb.html

For context, the default HBA configuration is the following:

```
host  all root all cert-password # fixed rule
host  all all  all cert-password # built-in CockroachDB default
local all all      password      # built-in CockroachDB default
```

The directive `host` covers both TLS and non-TLS incoming TCP
connections (`local` is for the unix socket). The method
`cert-password` means "client cert or password": without a cert, the
password is mandatory.

As previously, the user can further secure the configuration by
restricting non-TLS connections to just a subnetwork, for example:

```
host  all all 10.0.0.0/8 password # accept conns on the 10/8 network.
host  all all all        reject   # refuse conns from other nets.
local all all            password
```

Note that this change is limited to the server side: CockroachDB's own
`cockroach` CLI commands do not yet know how to connect to a
CockroachDB server without TLS; such connections are only supported
from `psql` or SQL client drivers in apps. See #53994 for a follow-up.

Release justification: fixes for high-priority or high-severity bugs in existing functionality


54019: roachtest: de-flake 'inconsistent' r=knz a=tbg

This test sets up an intentionally corrupted replica and wants its node
to shut down as a result of its detection. When only two of the three
nodes were included in the consistency check, either one of them could
end up terminating (as no obvious majority of healthy replicas can be
determined). Change the test so that we wait for the cluster to come
fully together before setting a low consistency check interval.

Closes #54005.

Release justification: testing
Release note: None


Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
knz added a commit to knz/cockroach that referenced this issue Sep 10, 2020
... to refer to issue cockroachdb#53404.

Release justification: non-production code changes

Release note: None
@bdarnell
Copy link
Contributor

The word "insecure" gives the wrong impression that CockroachDB is not a secure database.

Citation needed. Are people really getting confused by this? I think the real issue with --insecure mode is that because of the lack of grpc authentication, it's more insecure than people expect.

Cluster is secure in all cases.

Security is not binary. The question is always "secure against what threats?" Some of these cases are secure against threats that others are not, so it's misleading to say that they're all equally secure (of course they may be used in systems that are secure against those threats if security is provided at other layers)

And that's really what we're talking about here - traditionally we've just had all-or-nothing, with the no-protection-at-all --insecure mode or complex manually-managed certificates. Here we're adding multiple in-between modes. And then, because some of these intermediate modes will be strictly more secure than --insecure but no more difficult to use, we'll eventually eliminate --insecure mode. I think leading with the removal of --insecure is causing some surprise among users (although more than I expected - I thought we had discouraged this mode enough that no one would be using it in production).

That flowchart scares me - there's a lot of decision points there. A lot of them are new. The goal of #51991 and eliminating insecure mode is to reduce the number of options that most users experience, by steering nearly all users to the "internally-managed TLS" path. I think we need to reconsider the amount of flexibility we're providing here, and what decisions the user really needs to make.

For example, I don't think we ever want to give the "non-TLS SQL" option this much emphasis. CockroachCloud will always need to use TLS, so we must have a good user experience for TLS (perhaps by working with driver implementations on things like #32932).

Existing clusters previously run with "insecure mode"

Are there enough of these that we need to provide a no-downtime upgrade path? This seems like it could add quite a bit of complexity (can you upgrade from any security mode to any other, or only certain ones?)

We know from experience that customers who advocate for "insecure mode" really only care about point 3: they want TLS-less SQL connections

This is definitely not true. There have been some recent examples of this but I think the most common reason people resist setting their clusters up securely is about all the TLS setup, especially for node-to-node.

Possibly solve #54007 and/or advance #51991 to enable secure RPC without manual TLS configs

This seems like a requirement, not just a "possibly".

@knz knz changed the title cli: the --insecure flag is undesirable and will be removed from all CLI commands cli: the --insecure flag will be removed and replaced by easier-to-use always-secure options Sep 11, 2020
@knz
Copy link
Contributor Author

knz commented Sep 11, 2020

The word "insecure" gives the wrong impression that CockroachDB is not a secure database.

Citation needed.

This came up internally multiple times.

Are people really getting confused by this? I think the real issue with --insecure mode is that because of the lack of grpc authentication, it's more insecure than people expect.

This is pointed out prominently at the top of the issue description. But I'll take your point, as well as this one:

because some of these intermediate modes will be strictly more secure than --insecure but no more difficult to use, we'll eventually eliminate --insecure mode. I think leading with the removal of --insecure is causing some surprise among users.

I have adjusted the title of the issue to emphasize the new things / improvements and pulled that notion as first sentences/paragraphs in the issue description.

Cluster is secure in all cases.

Security is not binary. The question is always "secure against what threats?"

You and I know that security is not binary but the point here is that all the proposed options keep all internal security controls active, which --insecure=true does not.

Anyway you are right that we need to talk about threats. Adding a table in the issue description instead of the flowchart. Bram helped me understand that a list of threats is more easy to use than a flowchart anyway.

Some of these cases are secure against threats that others are not, so it's misleading to say that they're all equally secure

"say they're all equally secure" - that's a strawman. Nobody wrote this anywhere.

That flowchart scares me - there's a lot of decision points there. [...] I think we need to reconsider the amount of flexibility we're providing here, and what decisions the user really needs to make.

Point granted. I've reduced the scope accordingly.

For example, I don't think we ever want to give the "non-TLS SQL" option this much emphasis. CockroachCloud will always need to use TLS, so we must have a good user experience for TLS (perhaps by working with driver implementations on things like #32932).

It seems clear that we need to emphasize the CC use case first and foremost, and thus preserve TLS options as the main recommendation (and main recommended scenario). However we do have serious $$$ at risk if we don't offer other options.

Existing clusters previously run with "insecure mode"

Are there enough of these that we need to provide a no-downtime upgrade path? This seems like it could add quite a bit of complexity (can you upgrade from any security mode to any other, or only certain ones?)

There are at least a few customers asking (those big accounts who didn't like our TLS and went to prod with --insecure, despite our recommendations to the countrary). But we can narrow down the scope of the upgrade path to support just those customers, yes.

We know from experience that customers who advocate for "insecure mode" really only care about point 3: they want TLS-less SQL connections

This is definitely not true. There have been some recent examples of this but I think the most common reason people resist setting their clusters up securely is about all the TLS setup, especially for node-to-node.

Point taken. Adjusted the text accordingly.

Possibly solve #54007 and/or advance #51991 to enable secure RPC without manual TLS configs

This seems like a requirement, not just a "possibly".

The word "possibly" only pertained to #54007 (which you and I would agree is probably less important). Issue #51991 is definitely critical.

@knz
Copy link
Contributor Author

knz commented Sep 11, 2020

I also just realized that the TLS story could also be simplified by issuing a common TLS client cert for all SQL users (and give them separate passwords to distinguish them)

@aaron-crl
Copy link

Apologies for the late response but there are three things that strike me here:

  1. --insecure should probably do a better job of explicitly stating what is insecure mode. Perhaps we should include your list below in the output from the flag (formatted to CLI of course):
--insecure does the following:

deactivates TLS handshakes for node-node connections
deactivates TLS handshakes for node-client RPC connections
deactivates TLS handshakes for node-client SQL connections over TCP
deactivates TLS handshakes for node-client HTTP connections
deactivates node-to-node authentication
deactivates HTTP authentication
deactivates SQL authentication
deactivates SQL authorization
deactivate certain SQL features, so as to not create the illusion of security
  1. There's a lot here to unpack and given the potential impact to developers and customers perhaps this should be moved to an RFC given the current prevalence of this flag and depth of discussion.

  2. I don't feel we should mark a feature as deprecated until we have a supported alternative (or at least support for the majority of the usage). I don't feel we have that yet given that our own documentation still includes this flag for tutorials and guides.

@knz
Copy link
Contributor Author

knz commented Oct 13, 2020

Regarding point 1: there is now a clearer warning than before. The warning reads as follows:

*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
*
* This mode is intended for non-production testing only.
*
* In this mode:
* - Your cluster is open to any client that can access any of your IP addresses.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
*
* - https://go.crdb.dev/issue-v/53404/v20.2
* - https://www.cockroachlabs.com/docs/v20.2/secure-a-cluster.html
*

Regarding point 2: yes this work will be in a RFC of course.

Regarding point 3: agreed

craig bot pushed a commit that referenced this issue Jan 11, 2021
58662: cli/demo: avoid login URL if insecure r=otan a=knz

Fixes #58661  

NB: insecure is deprecated anyway  #53404 

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added the A-server-architecture Relates to the internal APIs and src org for server code label Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-security A-server-architecture Relates to the internal APIs and src org for server code C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-server-and-security DB Server & Security X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
Development

No branches or pull requests

5 participants