Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate role templates before saving role mapping #52636

Merged
merged 21 commits into from
Mar 24, 2020

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Feb 21, 2020

Role names are now compiled from role templates before role mapping is saved.
This serves as validation for role templates to prevent malformed and invalid scripts
to be persisted, which could later break authentication.

Resolves: #48773

@ywangd ywangd added >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.7.0 labels Feb 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authentication)

@ywangd
Copy link
Member Author

ywangd commented Feb 21, 2020

The current change can prevent templates being created when "inline" or "stored" scripts feature is disabled. However, an issue can still arise when:

  1. A template is created when the features are enabled
  2. The features are then disabled (this requires changes to the yml file and restart nodes AFIK)
  3. Authentication breaks if it relies on the template created in step 1.

I am not sure what would be the best way to handle the above scenario. Some sorta of startup check for state consistency? Do we already have somewhere in the code that does similar things?

@ywangd ywangd added the WIP label Feb 21, 2020
@ywangd
Copy link
Member Author

ywangd commented Feb 21, 2020

One place to implement the startup check is when NativeRoleMappingStore is instantiated. We can ask it to load all mappings, then perform validation on each of the template role names. Not sure if it is worth the effort since it adds overhead to startup time. Also we have no remediation other than "fail to start" when inconsistency is detected which could feel too harsh for end-users.

@ywangd ywangd removed the WIP label Feb 21, 2020
@ywangd
Copy link
Member Author

ywangd commented Feb 22, 2020

Disabling script features has a wider impact than just role templates. It prevents things like search template, script_fields, script_score from working as well. Currently there is no startup consistency check for any of these places and they all exhibit the same behaviour (i.e. existing queries will stop working if scripting is disabled). It is rather tricky to add consistency check as well because it needs to be checked along with many other things. Use role templates as an example, when scripting is disabled:

  1. If security is also disabled, it could be considered as consistent since no role mapping will be applied.
  2. If certain rule will not ever be true, e.g. the realm is removed or disabled, it could also be considered as consistent since a particular role mapping will not be applied.
  3. Users are disabled in the realm for which the role mapping is defined. Hence from user's perspective, it is consistent. But elasticsearch will not be aware of it since it is out-of-band knowledge.
  4. There could be many other more out-of-band situation where it is difficult to make the right decision for users.

Overall, I think it is a separate issue to be tackled somewhere else. It needs more discussion about what should be the "right" behaviour and identify all places where the "right" behaviour is applicable.

@jkakavas
Copy link
Member

jkakavas commented Feb 25, 2020

My opinion is that we should differentiate between "the template is syntactically wrong" and "features required to evaluate the template are currently disabled" . We should never allow storing a template when the former is true ( as this template will always be unusable ) but optimally we could validate and store the template even if the features are currently disabled ( Maybe a warning header would be helpful?) Yes, the latter would still cause authentication failures but this can be "fixed".

In absence of a way to evaluate the template while the features are disabled for the current node, I think it is fine that we do not allow the role mapping to be created.

I don't think we should try and detect what happens if a user disables the scripting features and act accordingly. We can't know for sure what the intention of the user would be and we try to not be too clever on behalf of the user. ( No reason not to have this discussion though in a separate thread ) The error message in the authentication failure is a good hint that disabling the scripting feature is what caused the authentication failure. We should probably add an IMPORTANT section in the docs in https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role-mapping.html#_role_templates explaining that inline and stored are necessary for role templates to work.

Copy link
Member

@jkakavas jkakavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test in TemplateRoleNameTests ?

@ywangd
Copy link
Member Author

ywangd commented Feb 26, 2020

Thanks @jkakavas

Can you please add a test in TemplateRoleNameTests ?

Test added.

My opinion is that we should differentiate between "the template is syntactically wrong" and "features required to evaluate the template are currently disabled"

This is a good point and I agree with it. Also as you already noticed, syntax check depends on the feature to be available and this blurs the distinction betwen the two things. I do have the impression that we general lean towards letting users configures something even if the feature is not available and defer the errors to runtime. With this standard, changes in this PR feel too harsh, i.e. if inline script is not enabled, the role mapping creation fails.

I wonder if we could have further refinement in how checks are defined. So instead of two, we would have three different checks:

  1. Whether template syntax is wrong. This checks the value of the "source" field or content of a stored script.
  2. Whether the feature (inline or stored script) is enabled
  3. Whether structure of the role template is wrong, e.g. it's not an object, or "source" field is not provided.
  4. Whether the role template's "id" field points to an existing script.

Only item 1 requires the scripting feature to be enabled. But maybe it is good enough for us to check only item 2 - 4, which in fact can fix the problems raised in the original issue. Also it might be the right thing to not check item 1, if user disable scripting feature, we should not compile a script at all regardless whether the purpose is for execution or validation. Otherwise it becomes an awkward situation of "yes it is disabled but it still runs on demand sometimes".

@ywangd ywangd requested a review from jkakavas February 26, 2020 05:45
@ywangd
Copy link
Member Author

ywangd commented Feb 28, 2020

Thanks for spending time during team meeting to discuss this PR. Here is a summary based on my understanding:

  1. It is OK to fail role mapping creation if role template is in use when scripting feature is disabled.
    • This makes the logic easier to understand because it is strict instead of trying to be lenient. We can afford the strictness here because the configuration is not saved in cluster state (as opposed to "ingest pipeline definition"). I'd quote from Jason, "leniency is abhorrent and should be eschewed", but add "unless its already in cluster state" :)
    • We will document this behaviour explicitly in docs.
  2. We don't want to validation to fail for valid templates.
    • This is a concern because the current implementation basically execute the script with empty context for validation. We have to be sure that empty context will not erroneously lead to execution failure. We'd rather have false postives (less validation) instead of false negatives (too aggressive and errorneous validation) if we have to choose between the two.

Item 2 is the technical oriented one. I added a bunch of tests to make sure validation does the right thing. Specifically, a test is added to prove that empty context will not lead to validation failures. Based on my understanding of mustache, it is safe to execute a template with empty context. The default behaviour of most implementations is to just return empty string. In the case of sections, the whole section is just ignored. This could mean less validation, but the important thing is that it will not fail.

The validation still correctly fails when there is a genuine syntax error.

Please let me know whether you think the added tests are sufficient to give us confidence that "template execution with empty context" is an acceptable implementation of validation. If not, I can fall back to take pieces out of the execution path to perform a less comprehesive but more lenient validation. Thanks.

Copy link
Member

@jkakavas jkakavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion for the docs but other than that LGTM

@ywangd ywangd requested a review from lcawl March 2, 2020 00:31
@ywangd
Copy link
Member Author

ywangd commented Mar 2, 2020

@lcawl Could you please review the doc changes? Thank you.

}
------------------------------------------------------------
// TEST[continued]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think carefully about whether we really want to document & support this?
What's the use case? Why do we want to elevate this to "officially supported"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good question. One issue of using stored script is consistency, i.e. the script can change without the role-mapping being aware of it. This could potentially creates confusion and support time frustration. Is this the reason why we may not want to "officially" support it? (I assume adding it to official docs means offical support).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case? Why do we want to elevate this to "officially supported"?

Can I flip the question ? Why do we "unofficially' support this already ? Why not disallow it instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I undersetand there are cases where we intentionally left them as a grey area, i.e. "unofficially supported". But I feel this is not the case here because:

  1. Stored script support is not an unique feature for this API. It is available and documented elsewhere. So in some sense, it is already officially supported. There could be issues with stored scripts in general, which should be a separate discussion.
  2. If we decide to disable the support here, it will be a breaking change even it is "unofficially" supported. So keeping it "unofficial" does not really buy us anything from this perspective.
  3. General gain of clarity when something is clearly documented

So overall, I'd prefer to have it documented. Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced.
What reason do we have for people to do this? Why should we spend time and effort maintaining the docs for this, when it has no clear value to anyone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some context that I am not aware of, e.g. any past discussions about not having this documented in the first place? I admit that I don't have a strong reason other than "completeness" to add this in. But are there strong reasons to not add it? I can think of a few candidates:

  1. We don't want people to use stored script for authentication related stuff since a missing script is more devastating
  2. We don't encourage stored script usage in general
  3. We may want disable stored script for role template in future and by not documenting it we could avoid a breaking change
  4. No user requires it
  5. We don't spend the extra time and effort because of 4

I'd be happy to go with item 1, 2, 3, but not particularly convinced by 4 and 5. I cannot say user would see value in this, but cannot say they would not either, unless there are past evidents of this being no value. Also the time and effort for this additional docs are not really significant.

I am not really attached to this snippet of docs. I am happy to either remove or keep it. Just like to clear my thought process. I'd appreicate if you could elaborate it a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less is more.
There's a temptation toward more docs (and lots of places where our docs are too brief) but it isn't always better.

One of our recurring experiences is that people try and configure ES security based on blogs because they seem simpler. The official docs have "more steps" so they follow a blog and then get stuck because those steps were actually necessarily and the blog skipped over them (or made assumptions that allowed them to ignore the steps) to keep it short.

Documenting something, tells people "this might be helpful to you" and also "you should read this if you want to understand how to use this feature". But I don't see any argument for why either of those are true here. So what's the reason to do it?

Or, put another way, if a customer raises a support ticket saying "I'm trying to use a role mapping with a stored script as the template" my immediate reaction would be "Why on earth would you do that? That's a terrible idea!"
And their (entirely reasonable) answer will be because you documented it.

I cannot say user would see value in this

Then don't do it.
It's easy - if we cannot work out a reason why users would want to do it, then don't recommend it to them. Don't waste time writing and maintaining the docs. Don't waste their time by asking them to read those docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a customer raises a support ticket saying "I'm trying to use a role mapping with a stored script as the template" my immediate reaction would be "Why on earth would you do that? That's a terrible idea!"

I can resonate with this argument. I do agree it's not a good idea to make a role template depend on external stuff. Since we cannot or don't want to recommend this usage ourselves, it makes sense to leave it out. Will update accordingly. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it’s worth, the role mapping UI in Kibana attempts to support the entire API surface. In other words, any valid role mapping created via the API should render and be editable in the UI. We of course default to inline scripts for these templates in the interface, but since stored scripts are technically available, we have to know how to render those as well, so there are UI controls to edit them.

I personally agree that these stored templates don’t make a lot of sense. Even if they aren’t documented, they’re more discoverable than they used to be because we now have a dedicated UI. For some folks, the UI will become the documentation, so like it or not, I fear we end up (currently) with the appearance that this is a first-class feature of role mappings.

We can consider changes to the UI which only surface these controls when editing an exiting mapping which already contains a stored script, but not without additional complexity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insights @legrego
You are right that the UI could be considered as documentation or even source of truth for some users. The UI also has a link to the doco ... it is possible that users would notice the difference and ask "why is there no document for this feature".

For now, we are trying to not actively promote it. I wonder whether we could afford to disable it in 8.0. But I suspect that the solution could end up being adding more machinery to make this feature more robust, e.g. reference counting a stored script.


==== Role Templates

NOTE: Role templates require the relevant scripting feature to be enabled,
Copy link
Contributor

@lcawl lcawl Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this NOTE should move below the introductory paragraph, since at this point we haven't defined role templates yet. I actually also think this whole "Role templates" subsection should be moved into the "Description" section. If you're willing to let me add that change to this PR, let me know and I'll push a commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lisa. Please make changes as you see fit. Thanks a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@ywangd
Copy link
Member Author

ywangd commented Mar 3, 2020

@lcawl The change of moving role template section around broke a reference in kibana doc. I noticed that you added a new anchor to "role template" section and this should be the new reference used by other docs. How should we proceed from here? It's sorta of a deadlock situation since changes to both repo need to happen at the same time. What is our common practice to solve this problem? Thanks.

@ywangd ywangd requested a review from tvernum March 23, 2020 23:02
Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywangd ywangd merged commit d046489 into elastic:master Mar 24, 2020
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Mar 24, 2020
Role names are now compiled from role templates before role mapping is saved.
This serves as validation for role templates to prevent malformed and invalid scripts
to be persisted, which could later break authentication.

Resolves: elastic#48773
ywangd added a commit that referenced this pull request Mar 24, 2020
Role names are now compiled from role templates before role mapping is saved.
This serves as validation for role templates to prevent malformed and invalid scripts
to be persisted, which could later break authentication.

Resolves: #48773
@ywangd
Copy link
Member Author

ywangd commented Mar 24, 2020

Backported:

  • v7.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Role mappings do not warn about invalid role_template entries
7 participants