-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.x Backport] Optimized Privilege Evaluation #4898
base: 2.x
Are you sure you want to change the base?
[2.x Backport] Optimized Privilege Evaluation #4898
Conversation
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
…aluation Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
…d_privileges.include_indices See discussion in opensearch-project#4380 (comment) Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
* Defines the first OpenSearch version which does not need the legacy headers | ||
* TODO this needs to be adapted | ||
*/ | ||
static final Version LEGACY_HEADERS_UNNECESSARY_AS_OF = Version.V_2_19_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: Before this PR is released, it must be made sure that the attribute LEGACY_HEADERS_UNNECESSARY_AS_OF
refers to the OpenSearch version this functionality is released in. Otherwise, DLS/FLS won't properly work in mixed clusters with older versions.
This also needs to be forward-ported to main
then.
Thank you for the significant contribution - the improvements look great 🙌 Would you be open to incorporating a feature flag here as well? In a previous major improvement, we found that feature flagging would have allowed us to roll it out more gradually. |
From a technical point of view, of course it will be possible. I just have a few detail questions:
Still, I guess we have to ask ourselves the question whether it is feasible and economical. Unfortunately, adding a feature flag will require a quite a bit of effort and will significantly increase maintenance effort due to a lot of code duplication. From the top of my head, we need to do this:
Would maybe releasing a beta version an alternative solution? Especially if the feature flag should be disabled upfront, there won't be so much significant difference to a beta version. In both cases, users would need to actively decide that they want to use that feature and that they are willing to accept any potential risks. |
Thanks @nibix .
Ideally, a runtime flag would be preferable, but a restart-required flag could also work as a simple mitigation if issues arises.
Given this is a performance improvement rather than a new feature where we're gathering user feedback, I'd recommend shipping it enabled by default. That way, users can easily disable it if any issues come up, without needing to opt-in.
Since there are no compatibility concerns, we could maintain the feature flag for 2-4 minor versions and then remove it. I'm open to other perspectives on the right timeline as well.
Thanks. These are very valid points and tradeoff considerations. I understand the additional effort required to implement and maintain a feature flag along with maintenance burden due to increased code duplication. However, given the core nature of the privilege evaluation logic being changed, having that extra safeguard in the short-to-medium term could be quite valuable. Perhaps a middle ground would be to include the feature flag for 2-3 release cycles, with the default set to "on" starting in 2.19. That way, we get the benefits of the flag without an overly long-lived maintenance burden. What are your thoughts on that approach? |
One complication of a feature flag would be the handling of DLS/FLS backwards compatibility - in mixed clusters. This is a bit tricky. There are several ways to tackle this thinkable. Each with different issues. So, I'd like to hear your feedback @cwperks @DarshitChanpura @krishna-ggk We need to consider backwards compatibility on mixed clusters because nodes on OpenSearch versions without this change rely on the presence of thread context headers to determine whether DLS/FLS restrictions need to be applied or not: security/src/main/java/org/opensearch/security/configuration/DlsFlsValveImpl.java Lines 444 to 478 in 5834190
The new implementation does not need these headers any more. But it needs to know when in talks to nodes with the old implementation. In that case the headers need to be sent. If they would not be sent, the search results from that nodes would contain data the respective user is not authorized to see. At the moment, the new implementation uses the OpenSearch version supplied in the connection to determine whether a node needs the headers or not: security/src/main/java/org/opensearch/security/privileges/dlsfls/DlsFlsLegacyHeaders.java Lines 107 to 135 in c22002a
If we introduce a feature flag, we can no longer use the condition Option 1: Feature flag in opensearch.ymlThis is a feature flag which can only be changed with a rolling restart. Thus, there might be mixed cluster conditions with some nodes having the feature flag on and some nodes having the feature flag off. If we use such a feature flag, we'd need to somehow know the value of the feature flag on the remote side of the connection. At the moment, I am not aware of such a mechanism. Is there one which I am not aware of? @cwperks @DarshitChanpura @krishna-ggk I could imagine a kind of "capabilities" mechanism stored in the cluster state which tells me the capabilities of each node and also whether all the nodes in the cluster have that capability. But that would be a complete new feature needing implementation. Option 2: Feature flag in config.ymlThis is a feature flag which can be changed during runtime. Changes to config.yml are broadcasted to the nodes - thus, the implementation can be switched semi-instantly. The issue is here still however the "semi". The config update process is still an async one. There will be a short time window where we also have a mixed cluster with the issues described for option 1. |
Agreed @nibix, backward compatibility is valid point and infact can remain if there is cross-cluster setup with mixed versions. As one thought, could we continue to populate headers irrespective for subsequent versions until we remove the old code at which point we can add back the LEGACY_HEADERS_UNNECESSARY_AS_OF for that version? I understand this may remove some performance benefit, but wondering if it is significant enough. |
Yes, that should be possible. You are right that this means that users with DLS, FLS or field masking active won't have significant performance improvements in that phase. |
Well - if the goal is managing risks caused by code changes, then I am not sure if the changes from #4706 would come with a lower risk than the risks implied by this PR. IMHO, FLS and field masking do have a less than optimal test coverage. This makes any change in this regard risky. Maybe, another strategy of managing the risk introduced by this PR is just to invest in new tests to increase test coverage especially for FLS and field masking? Concretely about the strategy from #4706: I am worried whether the index information obtained by the request index resolution can be sufficiently trusted in order to be usable for building the DLS/FLS/FM maps. There are are already now examples where this information cannot be used:
Potentially, there are more similar cases we do not know about, either in OpenSearch core or in plugins. The new DLS/FLS/FM implementation does not suffer from this problem, as it moves the access controls down to the actual places where the access happens - at this place it will be always 100% clear which index is accessed. I still think the two most promising ways to tackle unknown risks are:
Both however require a bit of work. |
Increased test coverage with more scenarios like Scroll, PIT and CCS with FLS/DLS/FM scenarios would be a good thing to have. The PR I raised did add a scenario for scroll, but not for PIT or CCS |
Thanks folks. I'm still responding only to partial aspects mentioned as I'm currently traveling. (I'd love to collaborate more on this with you all more deeply but stuck this week due to travel)
Agreed on the overall direction of addressing the risk. However I still think a feature flag to turn off is super helpful safeguard since the area we are improving is core authz capability that has security implications to users in case of any regression. I'm wondering if we can perhaps take the approach of validating correctness in couple minor versions before in addition to increased coverage and then target performance gain for subsequent version where we cleanup old code? I'm open to other approaches too, but just falling back to this as a potential approach to reduce risk. |
From my point of view, that should be possible. If users want to have immediate performance gains, one could even add a second flag which turns off the headers. However users would need to take some precaution to change the flag only at the right state. |
Description
This implements the optimized privilege evaluation as described in #3870 and backports the changes from #4380 to the 2.x branch.
Important: Before this can be release, the OpenSearch version in the
LEGACY_HEADERS_UNNECESSARY_AS_OF
property must be checked to be in sync with the actual release version. See review comment at #4898 (review)This introduces de-normalized data structures that are optimized for the checks that need to be done during privilege evaluation. Additionally, certain objects (like DLS queries) are prepared ahead of time, as early as possible in order to minimize the overhead during actual privilege evaluation.
This is a big change set - in order to facilitate the review, I have split it into three major commits:
The code is extensively commented - I hope that will help during review.
Performance tests indicate that the OpenSearch security layer adds a noticeable overhead to the indexing throughput of an OpenSearch cluster. The overhead may vary depending on the number of indices, the use of aliases, the number of roles and the size of the user object. The goal of these changes is to improve privilege evaluation performance and to make it less dependent on the number of indices, etc.
No significant behavioral changes in the "happy case", when privileges are present.
The undocumented config option
config.dynamic.multi_rolespan_enabled
is no longer evaluated. The code now behaves like it is always set totrue
- that is the former default. See #4495 for details.Some slight changes are present in error cases:
Issues Resolved
This is a backport from #4380
Testing
SecurityBackwardsCompatibilityIT
(extended in Fixed bulk index requests in BWC tests and hardened assertions #4817 )Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.