-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate role templates before saving role mapping #52636
Validate role templates before saving role mapping #52636
Conversation
Pinging @elastic/es-security (:Security/Authentication) |
The current change can prevent templates being created when "inline" or "stored" scripts feature is disabled. However, an issue can still arise when:
I am not sure what would be the best way to handle the above scenario. Some sorta of startup check for state consistency? Do we already have somewhere in the code that does similar things? |
One place to implement the startup check is when NativeRoleMappingStore is instantiated. We can ask it to load all mappings, then perform validation on each of the template role names. Not sure if it is worth the effort since it adds overhead to startup time. Also we have no remediation other than "fail to start" when inconsistency is detected which could feel too harsh for end-users. |
Disabling script features has a wider impact than just role templates. It prevents things like search template, script_fields, script_score from working as well. Currently there is no startup consistency check for any of these places and they all exhibit the same behaviour (i.e. existing queries will stop working if scripting is disabled). It is rather tricky to add consistency check as well because it needs to be checked along with many other things. Use role templates as an example, when scripting is disabled:
Overall, I think it is a separate issue to be tackled somewhere else. It needs more discussion about what should be the "right" behaviour and identify all places where the "right" behaviour is applicable. |
My opinion is that we should differentiate between "the template is syntactically wrong" and "features required to evaluate the template are currently disabled" . We should never allow storing a template when the former is true ( as this template will always be unusable ) but optimally we could validate and store the template even if the features are currently disabled ( Maybe a warning header would be helpful?) Yes, the latter would still cause authentication failures but this can be "fixed". In absence of a way to evaluate the template while the features are disabled for the current node, I think it is fine that we do not allow the role mapping to be created. I don't think we should try and detect what happens if a user disables the scripting features and act accordingly. We can't know for sure what the intention of the user would be and we try to not be too clever on behalf of the user. ( No reason not to have this discussion though in a separate thread ) The error message in the authentication failure is a good hint that disabling the scripting feature is what caused the authentication failure. We should probably add an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test in TemplateRoleNameTests
?
Thanks @jkakavas
Test added.
This is a good point and I agree with it. Also as you already noticed, syntax check depends on the feature to be available and this blurs the distinction betwen the two things. I do have the impression that we general lean towards letting users configures something even if the feature is not available and defer the errors to runtime. With this standard, changes in this PR feel too harsh, i.e. if inline script is not enabled, the role mapping creation fails. I wonder if we could have further refinement in how checks are defined. So instead of two, we would have three different checks:
Only item 1 requires the scripting feature to be enabled. But maybe it is good enough for us to check only item 2 - 4, which in fact can fix the problems raised in the original issue. Also it might be the right thing to not check item 1, if user disable scripting feature, we should not compile a script at all regardless whether the purpose is for execution or validation. Otherwise it becomes an awkward situation of "yes it is disabled but it still runs on demand sometimes". |
Thanks for spending time during team meeting to discuss this PR. Here is a summary based on my understanding:
Item 2 is the technical oriented one. I added a bunch of tests to make sure validation does the right thing. Specifically, a test is added to prove that empty context will not lead to validation failures. Based on my understanding of mustache, it is safe to execute a template with empty context. The default behaviour of most implementations is to just return empty string. In the case of sections, the whole section is just ignored. This could mean less validation, but the important thing is that it will not fail. The validation still correctly fails when there is a genuine syntax error. Please let me know whether you think the added tests are sufficient to give us confidence that "template execution with empty context" is an acceptable implementation of validation. If not, I can fall back to take pieces out of the execution path to perform a less comprehesive but more lenient validation. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion for the docs but other than that LGTM
Co-Authored-By: Ioannis Kakavas <[email protected]>
@lcawl Could you please review the doc changes? Thank you. |
} | ||
------------------------------------------------------------ | ||
// TEST[continued] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think carefully about whether we really want to document & support this?
What's the use case? Why do we want to elevate this to "officially supported"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good question. One issue of using stored script is consistency, i.e. the script can change without the role-mapping being aware of it. This could potentially creates confusion and support time frustration. Is this the reason why we may not want to "officially" support it? (I assume adding it to official docs means offical support).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case? Why do we want to elevate this to "officially supported"?
Can I flip the question ? Why do we "unofficially' support this already ? Why not disallow it instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I undersetand there are cases where we intentionally left them as a grey area, i.e. "unofficially supported". But I feel this is not the case here because:
- Stored script support is not an unique feature for this API. It is available and documented elsewhere. So in some sense, it is already officially supported. There could be issues with stored scripts in general, which should be a separate discussion.
- If we decide to disable the support here, it will be a breaking change even it is "unofficially" supported. So keeping it "unofficial" does not really buy us anything from this perspective.
- General gain of clarity when something is clearly documented
So overall, I'd prefer to have it documented. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced.
What reason do we have for people to do this? Why should we spend time and effort maintaining the docs for this, when it has no clear value to anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some context that I am not aware of, e.g. any past discussions about not having this documented in the first place? I admit that I don't have a strong reason other than "completeness" to add this in. But are there strong reasons to not add it? I can think of a few candidates:
- We don't want people to use stored script for authentication related stuff since a missing script is more devastating
- We don't encourage stored script usage in general
- We may want disable stored script for role template in future and by not documenting it we could avoid a breaking change
- No user requires it
- We don't spend the extra time and effort because of 4
I'd be happy to go with item 1, 2, 3, but not particularly convinced by 4 and 5. I cannot say user would see value in this, but cannot say they would not either, unless there are past evidents of this being no value. Also the time and effort for this additional docs are not really significant.
I am not really attached to this snippet of docs. I am happy to either remove or keep it. Just like to clear my thought process. I'd appreicate if you could elaborate it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less is more.
There's a temptation toward more docs (and lots of places where our docs are too brief) but it isn't always better.
One of our recurring experiences is that people try and configure ES security based on blogs because they seem simpler. The official docs have "more steps" so they follow a blog and then get stuck because those steps were actually necessarily and the blog skipped over them (or made assumptions that allowed them to ignore the steps) to keep it short.
Documenting something, tells people "this might be helpful to you" and also "you should read this if you want to understand how to use this feature". But I don't see any argument for why either of those are true here. So what's the reason to do it?
Or, put another way, if a customer raises a support ticket saying "I'm trying to use a role mapping with a stored script as the template" my immediate reaction would be "Why on earth would you do that? That's a terrible idea!"
And their (entirely reasonable) answer will be because you documented it.
I cannot say user would see value in this
Then don't do it.
It's easy - if we cannot work out a reason why users would want to do it, then don't recommend it to them. Don't waste time writing and maintaining the docs. Don't waste their time by asking them to read those docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a customer raises a support ticket saying "I'm trying to use a role mapping with a stored script as the template" my immediate reaction would be "Why on earth would you do that? That's a terrible idea!"
I can resonate with this argument. I do agree it's not a good idea to make a role template depend on external stuff. Since we cannot or don't want to recommend this usage ourselves, it makes sense to leave it out. Will update accordingly. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it’s worth, the role mapping UI in Kibana attempts to support the entire API surface. In other words, any valid role mapping created via the API should render and be editable in the UI. We of course default to inline scripts for these templates in the interface, but since stored scripts are technically available, we have to know how to render those as well, so there are UI controls to edit them.
I personally agree that these stored templates don’t make a lot of sense. Even if they aren’t documented, they’re more discoverable than they used to be because we now have a dedicated UI. For some folks, the UI will become the documentation, so like it or not, I fear we end up (currently) with the appearance that this is a first-class feature of role mappings.
We can consider changes to the UI which only surface these controls when editing an exiting mapping which already contains a stored script, but not without additional complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insights @legrego
You are right that the UI could be considered as documentation or even source of truth for some users. The UI also has a link to the doco ... it is possible that users would notice the difference and ask "why is there no document for this feature".
For now, we are trying to not actively promote it. I wonder whether we could afford to disable it in 8.0. But I suspect that the solution could end up being adding more machinery to make this feature more robust, e.g. reference counting a stored script.
|
||
==== Role Templates | ||
|
||
NOTE: Role templates require the relevant scripting feature to be enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this NOTE should move below the introductory paragraph, since at this point we haven't defined role templates yet. I actually also think this whole "Role templates" subsection should be moved into the "Description" section. If you're willing to let me add that change to this PR, let me know and I'll push a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lisa. Please make changes as you see fit. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
Co-Authored-By: Lisa Cawley <[email protected]>
Co-Authored-By: Lisa Cawley <[email protected]>
Co-Authored-By: Lisa Cawley <[email protected]>
@lcawl The change of moving role template section around broke a reference in kibana doc. I noticed that you added a new anchor to "role template" section and this should be the new reference used by other docs. How should we proceed from here? It's sorta of a deadlock situation since changes to both repo need to happen at the same time. What is our common practice to solve this problem? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Role names are now compiled from role templates before role mapping is saved. This serves as validation for role templates to prevent malformed and invalid scripts to be persisted, which could later break authentication. Resolves: elastic#48773
Role names are now compiled from role templates before role mapping is saved. This serves as validation for role templates to prevent malformed and invalid scripts to be persisted, which could later break authentication. Resolves: #48773
Backported:
|
Role names are now compiled from role templates before role mapping is saved.
This serves as validation for role templates to prevent malformed and invalid scripts
to be persisted, which could later break authentication.
Resolves: #48773