-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache for application privileges #55836
Conversation
Rest tests Documents Other tweaks and optimizations
Pinging @elastic/es-security (:Security/Authorization) |
// Avoid caching potential stale results. | ||
// TODO: It is still possible that cache gets invalidated immediately after the if check | ||
if (invalidationCounter == numInvalidation.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used in CompositeRolesStore
. However it is still possible that the cache gets invalidated immediately after the if
check. The window for it to happen is much small, but still exists in theory.
The other pattern is to use ListenableFuture
. This would solves the stale entry problem here because the future
itself is removed from the cached. So adding items to the removed future
has no impact to the cache. However, this pattern could potentially have a deadlock issue? If the thread computing for future crashes, will all the other thread waiting for it get stuck? If this is true, I'd prefer to use the first pattern since a stale entry (with very low chance) is less harmful than deadlocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a ReadWriteLock would solve the problem. Treat the invalidator as the writer and the cache population as the reader. The invalidator would need exclusive access, but we could support multiple populators.
I'll need to think about it again when it's not midnight, but I think it's reasonable (if used alongside the invalidationCounter
so the lock window is small).
...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java
Outdated
Show resolved
Hide resolved
applicationNamesCache.invalidateAll(); | ||
final Set<String> uniqueNames = Set.copyOf(updatedApplicationNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For invalidation, the application names cache is always completey invalidated since it requires some effort to identify the applicable entries. We could do this, but the gain may not be much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, what's Kibana's typical usage pattern for querying? Does it use wildcards for the application name?
If so, I think invalidating the name cache means that invalidating for a single application (which might not even exist) would effectively invalidate the whole cache because it would mean that querying for kibana*
would end up not using any cache at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kibana always sends a single concrete application name kibana-.kibana
. So it should be fine for Kibana's typical usage.
But let me know if you think it is still necessary. The logic would look like something as the follows:
foreach cache key (type is Set<String>)
foreach key member
foreach application
if (key member == application) or (key member is a wildcard and matches application)
invalidate the cache key
Java code would be
StreamSupport.stream(applicationNamesCache.keys().spliterator(), false)
.filter(keys -> keys.contains("*")
|| Sets.intersection(keys, uniqueNames).isEmpty() == false
|| keys.stream().filter(k -> k.endsWith("*")).anyMatch(
k -> uniqueNames.stream().anyMatch(n -> n.regionMatches(false, 0, k, 0, k.length()-1))))
.forEach(applicationNamesCache::invalidate);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it - realistically we're talking about Kibana only, so clearing all applications isn't actually going to hurt anyone, and the only time we will clear the cache is on privilege update which happens when you install a new Kibana version.
...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java
Outdated
Show resolved
Hide resolved
...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java
Outdated
Show resolved
Hide resolved
...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java
Outdated
Show resolved
Hide resolved
...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java
Outdated
Show resolved
Hide resolved
listener.onResponse(Collections.emptySet()); | ||
|
||
} else { | ||
final Tuple<Set<String>, Map<String, Set<ApplicationPrivilegeDescriptor>>> cacheStatus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning bells go off for me when I see complex Tuple
s like this (though I'm guilty of using them as well).
I'd prefer we avoided it entirely, but if we really need it, can we assign the members to appropriately named local vars as soon as possible after the method returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complexity is due to an attempt to optimize number of documents to be fetched from index. It can be simplified if we always fetching everything when hitting the index is unavoidable (as you suggested below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we remove this optimization the Tuple<Set<String>, Map<String, Set<ApplicationPrivilegeDescriptor>>>
data structure is no longer necessary. So complexity will definitely be reduced
x-pack/plugin/src/test/resources/rest-api-spec/api/security.clear_cached_privileges.json
Outdated
Show resolved
Hide resolved
} else { | ||
privilegesStore.invalidate(Arrays.asList(request.getApplicationNames())); | ||
} | ||
rolesStore.invalidateAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. It seems like this API ends up doing something other than what it was supposed to, just because we assume that the caller wants it.
I understand why - if the privileges have changed then the roles cache is probably wrong, but it seems like it's chain side-effects together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From just the API point of view, you are right that these two should not be tied together. There are valid use cases when user only wants to actively clear privileges cache. I did this because the two are always tied together in NativePrivilegesStore
since it was clearing role cache before my change.
I tried to avoid nested callbacks (clear role cache then clear privileges cache). It seems OK and more efficient by just looking at NativePrivilegesStore
. But it does feel wrong from pure API side.
I could either just go with nested callback or create a transport layer only action to clear both caches. So it is not exposed at REST layer and still has the efficiency on transport layer. But this does lead some code redundancy. Another option is to have a query parameter for the clear privilege cache API. When set to true, it clears both caches. What do you think?
...a/org/elasticsearch/xpack/security/action/privilege/TransportClearPrivilegesCacheAction.java
Outdated
Show resolved
Hide resolved
import java.io.IOException; | ||
import java.util.List; | ||
|
||
public class ClearPrivilegesCacheResponse extends BaseNodesResponse<ClearPrivilegesCacheResponse.Node> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate to this PR, it feels like we could consolidate these duplicate classes into a common base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common base class for all ClearXxxCacheResponse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Not a priority, but there's a bunch of copy paste here that we could ditch.
...in/core/src/main/java/org/elasticsearch/xpack/core/security/support/CacheIteratorHelper.java
Outdated
Show resolved
Hide resolved
Resolves #54317 |
…ecurity/authz/store/NativePrivilegeStore.java Co-Authored-By: Tim Vernum <[email protected]>
…ticsearch into es-54317-app-privilege-cache
…ecurity/action/privilege/TransportClearPrivilegesCacheAction.java Co-Authored-By: Tim Vernum <[email protected]>
…ticsearch into es-54317-app-privilege-cache
@elasticmachine run elasticsearch-ci/1 |
After discussion with @tvernum, it is agreed that a ReadWriteLock is necessary to achieve maximum correctness. Indeed we cannot completely avoid caching stale result. But we can ensure if stale result is cached, it will be invalidated as soon as possible and this is where the ReadWriteLock comes in. It works as the follows:
The acquisition of a read lock ensures any invalidation requests will be held off until the current result is cached. If the current reseult is stale, e.g. because index is updated while it is being cached, the locking mechanism guarantees that invalidation will happen after current caching action finishes. In another word, the time window for a stale result to stay in the cache is as small as possible. The code is updated accordingly. |
Just pushed two more updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think we can be a bit smarter with the locking.
// Always completely invalidate application names cache due to wildcard | ||
applicationNamesCache.invalidateAll(); | ||
uniqueNames.forEach(descriptorsCache::invalidate); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can release the lock immediately after incrementing numInvalidation
. It will mean that, in theory, we could invalidate things that we don't need to, but would mean holding a lock for less time.
Did you consider the trade-off of how long to lock for vs perfect cache accuracy?
If we keep the lock around all the invalidation process, then I think the calculation of uniqueNames
should be before the lock is acquired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tvernum. Your comment is very insightful. You are right we could minimize the locking time in this case. The possibility of invalidating more than necessary should be very low. The getPrivileges
thread needs to read the incremented value of numInvalidation
and perform a search query and all these have to complete before the cache is invalidated. The chance should be extremely low. I have updated the code to minimize the locking for both invalidate
and invalidateAll
.
A more possible scenario is "unnecessarily skipping put result in cache". We only cache when numInvalidation
does not change. But this value changes for both partially and full cache invalidation. In the case of partial invalidation, the things get invalidated may not be relevant to the things that we want to cache. But the code would just skip caching them regardlessly for simplicity. This however has nothing to do with the locking, i.e. the same situation exists before we added the locking. Overall, combined with how Kibana behaves, I think this is an acceptable trade-off because:
- The chance is still low
- We always fully invalidate
applicationNamesCache
even for partially invalidation. And we decide to keep it this way for simplicity.
Also moved descriptorsCache != null
check before the read lock, so we do not just lock and find out there is no cache to use. It is an edge case optimization but anyway it is easy to add.
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics. Co-authored-by: Russ Cam <[email protected]>
Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics. Co-authored-by: Russ Cam <[email protected]>
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.
A few key points of the changes:
NativePrivilegeStore
is changed to always fetching all privilege documents for a given application.CompositeRolesStore
(for authentication).CompositeRolesStore
from the caching. But this means no code can be deleted fromNativePrivilegeStore
. We basically have to add "retriving by application name" on top of existing query logic. For simplicity, I later decided to include it so that the query part can be largely simplified.Resolves: #54317