Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set elastic password and generate enrollment token for initial node #75310

Closed
7 tasks
jkakavas opened this issue Jul 13, 2021 · 13 comments · Fixed by #75816
Closed
7 tasks

Set elastic password and generate enrollment token for initial node #75310

jkakavas opened this issue Jul 13, 2021 · 13 comments · Fixed by #75816

Comments

@jkakavas
Copy link
Member

jkakavas commented Jul 13, 2021

For security on by default project, we need the functionality that would allow us to set the password of the elastic superuser ( if it has not been already set ) and to generate an enrollment token for Kibana. This functionality needs to be encapsulated in a class and that can be invoked as a CLI tool, similar to org.elasticsearch.xpack.security.cli.AutoConfigInitialNode and will be spawn off by the bin/elasticsearch script.

Required functionality:

  • Read the keystore.seed from the elasticsearch keystore so that it can be used as credentials for the API requests
  • verify that the ES node that it makes requests against is up and running (with retries). We can perform naive checks against the health of the cluster for now until we have reworked more fine-grained checks for cluster health and the security index availability.
  • (Possibly) Generate a random password for the elastic user
  • Set the password of the elastic user via a change password API request
  • Generate an enrollment token with kibana scope
  • Calculate the fingerprint of the CA certificate that ES is configured to use for the HTTP layer
  • Output the 3 pieces of information above so that they can be consumed by the executor of the Command

The expected behavior for this process is:

  1. The process will be called once, only the first time the node starts
  2. It should read the elasticsearch keystore. If there is a bootstrap.password value in the keystore, we assume that the administrator wants to own their configuration related to the password of the elastic user. We will make no attempt to auto-config the elastic password nor attempt to "promote" this password value automatically to the security index.
  3. If there is no bootstrap.password value, it will use the keystore.seed value as authenticating credentials and an auto-generated password as the value to set the password of the elastic user to, using the Change Password API

For reference:

  • autoconfiguration.password_hash : This is a value that contains the salted hash ( using the default hashing algorithm ) for the elastic user. We will be generating and setting this, during package (DEB,RPM) installation, so that we can show it to the user.
  • bootstrap.password : This is a value that contains the plaintext password of the elastic user. This can be set by the user manually before they start elasticsearch for the first time and/or is set in certain cases automatically by us ( In Docker, when the environment variable ELASTIC_PASSWORD is being set. The value of this setting can be used as the elastic user password for the local node, until the time that the password of the elastic user is set in the reserved realm and the relevant document gets created in the security index.
  • keystore.seed : This is auto-generated on startup of the node. It's value of this setting can be used as the elastic user password for the local node - if bootstrap.password is not already set - until the time that the password of the elastic user is set in the reserved realm and the relevant document gets created in the security index.
@jkakavas
Copy link
Member Author

jkakavas commented Jul 19, 2021

See the first post with the updated decisions.

@tvernum
Copy link
Contributor

tvernum commented Jul 20, 2021

I have concerns regarding behaviours 2 & 3.

  1. If there is only a bootstrap.password value in the keystore, it will use this value [...] as the value to set the password of the elastic user to, using the Change Password API
  2. If there is both a bootstrap.password.hash and a bootstrap.password value in the keystore, that indicates that the [...] bootstrap.password [...] is the intended value for the password of the elastic user. We would thus use this [...] as the value we will be setting the password of the elastic user to.

In both of these cases the indexed password for the cluster's superuser would also be available (typically in plaintext) in the elasticsearch keystore. I don't think this is something we should ever do.

I am happy with behaviour 1 (the password hash in the keystore is not any greater risk than a password hash in the file realm). And I am happy (from a security point of view) with behaviour 4, though I'd like to understand what happens to that auto-generated password (do we print it to console?)

My pretty strong view (but we can chat if there's disagreement) is that if there is no bootstrap.password.hash in the keystore then we should either

  1. not set the indexed password; or
  2. set the password to an auto-generated value; or
  3. know with absolutely certainty that we can remove the plaintext value from the keystore once it has been indexed (which I think is hard, and least preferred).

@tvernum
Copy link
Contributor

tvernum commented Jul 20, 2021

On a purely technical note: I think it needs to be bootstrap.password_hash (not .hash). The settings infrastructure won't cope with a setting that has both a direct value and child values.
In yaml you can't have:

bootstrap:
    password: changeme
       hash:  $2a$04$JHmd2L4mUJkY4flmWArP2OUz9hneHeWbnTonBi3BLKibb1dl3UgDW

and since the elasticsearch.keystore is an extension of the yaml based settings into an encrypted store, it won't like it either.

@tvernum
Copy link
Contributor

tvernum commented Jul 20, 2021

Final note (for now),

bootstrap.password.hash : [...] The node has no use or understanding of this setting, setting it in itself does nothing.

I find it strange that there are cases where we write a password (hash) to the keystore, and will tell admins that it is the password for the elastic user, but actually it doesn't work until the cluster forms and we can write to the security index.

I don't think it's a blocker, but we need to at least be aware that it's a potential trap for admins, and document clearly that this password will not work until there is a healthy cluster.

The other option is to have the node respect this hashed password. I think that means it has to be accepted in addition to the plaintext password (bootstrap.password or keystore.seed or if we want, the text of hash itself) so that this new CLI tool has access to a password that it can use to authenticate.

@jkakavas
Copy link
Member Author

jkakavas commented Jul 20, 2021

Thanks for raising these Tim! As we discussed, I'm in favor of :

The other option is to have the node respect this hashed password. I think that means it has to be accepted in addition to the plaintext password (bootstrap.password or keystore.seed or if we want, the text of hash itself) so that this new CLI tool has access to a password that it can use to authenticate.

I think this added functionality makes sense and it is a much less surprising behavior. It will also allow us to make our existing password bootstrapping process in Docker much more secure, even for 7.x

Regarding your other points:

In both of these cases the indexed password for the cluster's superuser would also be available (typically in plaintext) in the elasticsearch keystore. I don't think this is something we should ever do.

Keeping in mind that we will try to clear the password from the keystore whenever we can, after we set it cluster-wide, and mostly for the sake of discussion: We would not be adding a cleartext boostrap.password value in any of the use cases of the security on by default project. This was meant as a way to allow us to continue to support existing user behavior of setting bootstrap.password themselves before the node starts, and extending it to be more (presumably) useful.

We would now take this node-only valid password for elastic user and we would make it cluster-wide valid. If a malicious user would get access to the keystore to read the password, it means they have read access to the local filesystem so presumably network connectivity to the local node. If they wanted to run a malicious command against the cluster, they can already do that by authenticating to the local node.

I would understand the difference in the attack surface perception if we could assume that the users that already set bootstrap.password for any reason, then move on to set the password of the elastic user cluster-wide via our APIs to a different value, so that the value that remains in the keystore in plaintext is not valid, but I don't think we can make this assumption.

I'm also happy for us to reconsider our support for bootstrap.password generally, but I'm interested to discuss more on why you think that this change makes it more problematic than it already is.

On a purely technical note: I think it needs to be bootstrap.password_hash (not .hash).

++

I am happy with behaviour 1 (the password hash in the keystore is not any greater risk than a password hash in the file realm). And I am happy (from a security point of view) with behaviour 4, though I'd like to understand what happens to that auto-generated password (do we print it to console?)

yes, this will be printed in the console when there is a console attached. The current WIP for how to print this in the console is in #74516.

@BigPandaToo
Copy link
Contributor

BigPandaToo commented Jul 21, 2021

Couple of things I would like to clarify...

  1. The behaviour in case of the concarrent (user's) process beat the setup process trying to reset the password. In this case we cannot (obviously) display the password and we cannot generate the enrollment token, because we don't know the password. So, in case we are getting 409 or 401 when trying to reset the password or 401 trying to generate the enrollment token for Kibans, I think we have two options here:
    • bail out and display a meaningful message describing a course of actions (something like: use the enrollment token command line tool to generate the enrollment token for Kibana etc.) -- I think it is preferable considering that it should be a relatively rare case
  • try to fix ourselves by reusing the enrollment token tool's code (use file realm user) -- I think it may become pretty messy if we go this way

Keeping in mind that we will try to clear the password from the keystore whenever we can

@jkakavas Which process has to take care of it? I thought that if we manage to reset the password and generate the enrollment token, we are good to go and clear the bootstrap password. Thoughts?

@albertzaharovits
Copy link
Contributor

Thank you @tvernum for the input!

Running the risk of restating a point already made clear, we don't plan on storing plaintext password values in the node's keystore, beyond what docker already does today. Actually, the point of introducing the new bootstrap.password_hash setting is to avoid using the bootstrap.password for package installations (which configure security at install time).

If we generate and show (for package installs) or accept (for docker runs - currently) a password for elastic before the node starts, we make a "promise" to set the elastic user's password. Ideally (the easiest to communicate) is that the password is valid as soon as the node handles requests (this is to your point Tim from #75310 (comment)).

BUT, I think this requires that we move the action to put the password hash from the keystore to the .security index inside the node. That is because an outside-the-node request must be authenticated, and there's only the elastic user to authenticate as, but the request at hand is about setting the promised password for the elastic user, so which password does this request use. Expanding on it a bit further, in the general case (not Security ON by default), we'll have to worry about different password promises on different nodes. A node might authenticate with the local "promised" hash for some time, but at some point in time, the .security index becomes available, and the effective password changes to a different value from the local promise.

I'll have to think it over some more. I'm not sure anymore what are the step by steps.

@tvernum
Copy link
Contributor

tvernum commented Jul 26, 2021

I'm interested to discuss more on why you think that this change makes it more problematic than it already is.

Our documentation tells people to run setup-passwords after their cluster starts. If they do that, then the bootstrap password is no longer useful for authenticating to the cluster. Our design & risk assessment was predicated on most people doing what we advised them to do. And, I think our obligations to our users are greater if they follow our documentation - if you do what we recommend then you will have a risk profile that we think is generally acceptable. Doing your own thing, outside of the documentation might increase that risk, and we try to avoid those situations, but they're inevitable and very hard to combat.

I don't think we've done a good enough job on that with our docker instructions, but the answer is to do better, not to make everything worse.

The behaviour in case of the concurrent (user's) process beat the setup process trying to reset the password.

If an admin actively tries to change the elastic password while we're running the intialisation process, it's perfectly fine for that to fail.
I think our only obligation there is to not break things or give incorrect information to the end user. As long as we can detect the situation and fail with a clear (enough) error message, I think we've done our job.

Keeping in mind that we will try to clear the password from the keystore whenever we can

However, in a typical package install, we know we can't, because the keystore is read-only to the user running ES.

BUT, I think this requires that we move the action to put the password hash from the keystore to the .security index inside the node.

That's my sense as well, though I'm not as close to the detail as the rest of the team, so I might be missing things.

My feeling is that this is probably easier if we think about packages and archives separately.

For archives:

  • Install is just an unzip / tar -x, and no configuration happens.
  • Nothing happens until the first node starts.
  • The bootstrap password will be keystore.seed and is generated in the keystore automatically on first use.
  • The "configure TLS" step happens on first use, before the node starts
  • The "setup initial security" process runs after the node starts, and can run outside of the node:
    1. Generate a new random password.
    2. Use the keystore.seed to set the elastic user's password to the generated password
    3. Generate an enrollment token
    4. Inform the user (on the console) of the password and enrollment token.

For packages:

  • Install does real configuration:
    1. Runs the "configure TLS" step
    2. Generate a new random password
    3. Write the hash to the keystore as bootstrap.password_hash
    4. Inform the user (on the console) of the password
  • When the node starts for the first time, it detects that there is no security index and there is a bootstrap.password_hash so it creates the security index and copies the hash to the reserved-user-elastic document.
  • There is no enrollment token.

I think we've been trying to make those processes the same, but they're not, and I suspect it's easier if we accept that and embrace the differences.

@jkakavas
Copy link
Member Author

I have updated #75310 (comment) to reflect our current design.

My feeling is that this is probably easier if we think about packages and archives separately.

I agree we can ( and maybe should ) think about these differently, but I think we can merge the approaches into one :

  • The "setup initial security" process runs after the node starts the first time, and can run outside of the node:
    • If there is a value in the bootstrap.password_hash key (packages), use that to make he change password call
    • If there is none ( archive), generate one and make the change password call with it

Since we won't be doing this, this is pretty much a theoretical discussion, but an interesting one so I'll add a bit:

Our documentation tells people to run setup-passwords after their cluster starts. If they do that, then the bootstrap password is no longer useful for authenticating to the cluster.

I think ( and I acknowledge that this is a hunch as I don't have hard evidence ) that the use cases of the users who set bootstrap.password and the use cases of the users who use setup-passwords don't intersect that much. If you are about to run setup-passwords, there is no need to set boostrap.password and if you set boostrap.password it's probably because you want to set the password of the elastic user to some predefined value that you can use and you don't want to go out of your way to script around setup-passwords.
In this sense, we wouldn't be changing anything here, apart from promoting the password to the security index which in my view does nothing to the threat surface of our password bootstrapping process.

@albertzaharovits
Copy link
Contributor

I agree we can ( and maybe should ) think about these differently, but I think we can merge the approaches into one :
The "setup initial security" process runs after the node starts the first time, and can run outside of the node:
If there is a value in the bootstrap.password_hash key (packages), use that to make he change password call
If there is none ( archive), generate one and make the change password call with it

In this particular case I think we're going to be more confident with different solutions for archives and packages.

For archives, an external "helper" process, as laid out latest by Tim in #75310 (comment), and as has been proposed and discussed before, is the best thing we can achieve.
The generated password and the enrollment token are printed together, because they are generated together. We don't have to worry about error handling to the extent we would if the code were run during a node bootstrap in the same process, and node bootstrap does not have to be concerned with an auto-configuration flow. We can also reuse the code of our cmd line tools. Overall it is easy to understand and implement in this way because a lot of things happen at node startup inside the node process but, by contrast, no other "helper" processes are spin up when ./bin/elasticsearch is invoked, so we have a minimum of stuff to reason about (such as error codes when node is not fully started) but we don't contend with anything else running in parallel.

For packages, we "promise" a password and we don't output enroll token. As discussed, to me, the promised password is a standalone sub-feature. In general, we can not have at the same time:

  • a password valid before the node joins/forms a cluster
  • the same password be valid after the cluster formed
  • accept multi node (with possible different configurations for the promised password) cluster formation

Therefore, I like how you renamed the setting to autoconfiguration.password_hash; it is something different from "bootstrap" (so it's clearer they are different) and close to "install-time-generated elastic password hash" .
We need to highlight that the "promised" password sub-feature only works for initial single node clusters, like the ones that "Security ON by default" creates.


So I'm +1 on the latest flow as described by Tim, where the package security config runs off-process and the archive config process runs inside the node, taking care that they don't both kick off at the same time.

@jkakavas
Copy link
Member Author

It still makes more sense to me to converge the two approaches but it is a preference revolving mostly around fewer code paths, less code and the fact that I think we can handle both archives and packages in a similar way so why not do it ? Given my lack of strong arguments and that I don't dislike the proposed alternative, I don't think it's worth re-hashing the discussion for another round. Unless @BigPandaToo has strong preference/arguments to block this, I'm +1 too. Let us move in this direction. I'll summarize everything in the first post of this issue so that the decisions made are clear

@BigPandaToo
Copy link
Contributor

I am also in favor of a single process covering both cases for exactly the same reasons @jkakavas listed above, but I don't have strong contra arguments against the separated approaches, so +1 for the latter.

@tvernum
Copy link
Contributor

tvernum commented Jul 27, 2021

You should take my suggestions lightly - I'm not involved in writing the code and it would be a bad outcome if you simply push ahead with my proposals despite technical obstacles.

If I understand correctly, the alternate proposal would be for archives to be (changes highlighted):

  • Install is just an unzip / tar -x, and no configuration happens.
  • Nothing happens until the first node starts.
  • (new) In the startup script, before the node starts, randomly generate a password and store its hash in the keystore
  • The "configure TLS" step happens on first use, also before the node starts
  • (copied from package process) When the node starts for the first time, it detects that there is no security index and there is a bootstrap.password_hash so it creates the security index and copies the hash to the reserved-user-elastic document.
  • The "setup initial security" process runs after the node starts, and can run outside of the node:
    1. (new) Poll the server using the generated password - waiting for the node to copy the hash into the user doc.
    2. Generate an enrollment token
    3. Inform the user (on the console) of the password and enrollment token.

The tradeoff is that this requires a bit more coordination of parts (some steps before the node startup, some steps inside the node itself, some steps after the node starts) but has more similarity to the package startup process.

I don't think I'm close enough to the code to know which way that tradeoff should fall. I'm happy with whichever choice you make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants