Add cache for application privileges #55836

ywangd · 2020-04-28T03:16:23Z

Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

A few key points of the changes:

Per discussion Add cache for application privileges #54317, the main cache is keyed by concrete application name and values are a set of application privilege descriptors.
A secondary cache is also added to map a set of application expressions (i.e. with wildcard) to a set of concrete application names.
Due to the cache design, privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application.
The change is applied to all places including "get privilege", "has privileges" API and CompositeRolesStore (for authentication).
- Initially I wanted to exclude CompositeRolesStore from the caching. But this means no code can be deleted from NativePrivilegeStore. We basically have to add "retriving by application name" on top of existing query logic. For simplicity, I later decided to include it so that the query part can be largely simplified.
Added an API to clear privilege cache, which is also used internally when adding/deleting privileges. These operations already clear role cache, the privilege cache invalidation is added on top of it.
Add security index state listener to clear cache on state changes.
Docs added
Yaml tests added

Resolves: #54317

Rest tests Documents Other tweaks and optimizations

elasticmachine · 2020-04-28T03:16:25Z

Pinging @elastic/es-security (:Security/Authorization)

ywangd · 2020-04-28T03:24:32Z

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

+                    // Avoid caching potential stale results.
+                    // TODO: It is still possible that cache gets invalidated immediately after the if check
+                    if (invalidationCounter == numInvalidation.get()) {


This pattern is used in CompositeRolesStore. However it is still possible that the cache gets invalidated immediately after the if check. The window for it to happen is much small, but still exists in theory.

The other pattern is to use ListenableFuture. This would solves the stale entry problem here because the future itself is removed from the cached. So adding items to the removed future has no impact to the cache. However, this pattern could potentially have a deadlock issue? If the thread computing for future crashes, will all the other thread waiting for it get stuck? If this is true, I'd prefer to use the first pattern since a stale entry (with very low chance) is less harmful than deadlocking.

I think a ReadWriteLock would solve the problem. Treat the invalidator as the writer and the cache population as the reader. The invalidator would need exclusive access, but we could support multiple populators.

I'll need to think about it again when it's not midnight, but I think it's reasonable (if used alongside the invalidationCounter so the lock window is small).

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

ywangd · 2020-04-28T03:27:45Z

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

+        applicationNamesCache.invalidateAll();
+        final Set<String> uniqueNames = Set.copyOf(updatedApplicationNames);


For invalidation, the application names cache is always completey invalidated since it requires some effort to identify the applicable entries. We could do this, but the gain may not be much.

Remind me, what's Kibana's typical usage pattern for querying? Does it use wildcards for the application name?

If so, I think invalidating the name cache means that invalidating for a single application (which might not even exist) would effectively invalidate the whole cache because it would mean that querying for kibana* would end up not using any cache at all.

Kibana always sends a single concrete application name kibana-.kibana. So it should be fine for Kibana's typical usage.

But let me know if you think it is still necessary. The logic would look like something as the follows:

foreach cache key (type is Set<String>) foreach key member foreach application if (key member == application) or (key member is a wildcard and matches application) invalidate the cache key

Java code would be

StreamSupport.stream(applicationNamesCache.keys().spliterator(), false) .filter(keys -> keys.contains("*") || Sets.intersection(keys, uniqueNames).isEmpty() == false || keys.stream().filter(k -> k.endsWith("*")).anyMatch( k -> uniqueNames.stream().anyMatch(n -> n.regionMatches(false, 0, k, 0, k.length()-1)))) .forEach(applicationNamesCache::invalidate);

Let's leave it - realistically we're talking about Kibana only, so clearing all applications isn't actually going to hurt anyone, and the only time we will clear the cache is on privilege update which happens when you install a new Kibana version.

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

tvernum · 2020-04-29T04:43:02Z

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

+            listener.onResponse(Collections.emptySet());
+
+        } else {
+            final Tuple<Set<String>, Map<String, Set<ApplicationPrivilegeDescriptor>>> cacheStatus;


Warning bells go off for me when I see complex Tuples like this (though I'm guilty of using them as well).

I'd prefer we avoided it entirely, but if we really need it, can we assign the members to appropriately named local vars as soon as possible after the method returns?

This complexity is due to an attempt to optimize number of documents to be fetched from index. It can be simplified if we always fetching everything when hitting the index is unavoidable (as you suggested below).

Once we remove this optimization the Tuple<Set<String>, Map<String, Set<ApplicationPrivilegeDescriptor>>> data structure is no longer necessary. So complexity will definitely be reduced

x-pack/plugin/src/test/resources/rest-api-spec/api/security.clear_cached_privileges.json

tvernum · 2020-04-29T06:00:50Z

...a/org/elasticsearch/xpack/security/action/privilege/TransportClearPrivilegesCacheAction.java

+        } else {
+            privilegesStore.invalidate(Arrays.asList(request.getApplicationNames()));
+        }
+        rolesStore.invalidateAll();


I'm not sure about this. It seems like this API ends up doing something other than what it was supposed to, just because we assume that the caller wants it.
I understand why - if the privileges have changed then the roles cache is probably wrong, but it seems like it's chain side-effects together.

From just the API point of view, you are right that these two should not be tied together. There are valid use cases when user only wants to actively clear privileges cache. I did this because the two are always tied together in NativePrivilegesStore since it was clearing role cache before my change.

I tried to avoid nested callbacks (clear role cache then clear privileges cache). It seems OK and more efficient by just looking at NativePrivilegesStore. But it does feel wrong from pure API side.

I could either just go with nested callback or create a transport layer only action to clear both caches. So it is not exposed at REST layer and still has the efficiency on transport layer. But this does lead some code redundancy. Another option is to have a query parameter for the clear privilege cache API. When set to true, it clears both caches. What do you think?

...a/org/elasticsearch/xpack/security/action/privilege/TransportClearPrivilegesCacheAction.java

tvernum · 2020-04-29T06:26:12Z

...ava/org/elasticsearch/xpack/core/security/action/privilege/ClearPrivilegesCacheResponse.java

+import java.io.IOException;
+import java.util.List;
+
+public class ClearPrivilegesCacheResponse extends BaseNodesResponse<ClearPrivilegesCacheResponse.Node>


Separate to this PR, it feels like we could consolidate these duplicate classes into a common base class.

A common base class for all ClearXxxCacheResponse?

Yes. Not a priority, but there's a bunch of copy paste here that we could ditch.

...in/core/src/main/java/org/elasticsearch/xpack/core/security/support/CacheIteratorHelper.java

ywangd · 2020-04-30T02:16:44Z

Resolves #54317

…ecurity/authz/store/NativePrivilegeStore.java Co-Authored-By: Tim Vernum <[email protected]>

…ticsearch into es-54317-app-privilege-cache

…ecurity/action/privilege/TransportClearPrivilegesCacheAction.java Co-Authored-By: Tim Vernum <[email protected]>

…ticsearch into es-54317-app-privilege-cache

ywangd · 2020-06-02T14:09:42Z

@elasticmachine run elasticsearch-ci/1

…ege-cache

ywangd · 2020-06-03T13:12:18Z

After discussion with @tvernum, it is agreed that a ReadWriteLock is necessary to achieve maximum correctness. Indeed we cannot completely avoid caching stale result. But we can ensure if stale result is cached, it will be invalidated as soon as possible and this is where the ReadWriteLock comes in. It works as the follows:

Before caching the result, acquire a read lock
Before invalidating the cache, acquire a write lock

The acquisition of a read lock ensures any invalidation requests will be held off until the current result is cached. If the current reseult is stale, e.g. because index is updated while it is being cached, the locking mechanism guarantees that invalidation will happen after current caching action finishes. In another word, the time window for a stale result to stay in the cache is as small as possible.

The code is updated accordingly.

…ege-cache

ywangd · 2020-06-15T02:03:19Z

Just pushed two more updates:

Add a ttl (default 24h) to the caches. This setting acts as a safety net in rare cases where things go out of sync and we ensure out-of-sync info will be cleared out eventually.
Use a single the cache size setting for both caches because the two layer caching is an implementation detail and does not feel right to enforce users to be aware of it.

tvernum

LGTM, but I think we can be a bit smarter with the locking.

tvernum · 2020-06-24T05:31:35Z

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

+            // Always completely invalidate application names cache due to wildcard
+            applicationNamesCache.invalidateAll();
+            uniqueNames.forEach(descriptorsCache::invalidate);
+        }


I think we can release the lock immediately after incrementing numInvalidation. It will mean that, in theory, we could invalidate things that we don't need to, but would mean holding a lock for less time.

Did you consider the trade-off of how long to lock for vs perfect cache accuracy?

If we keep the lock around all the invalidation process, then I think the calculation of uniqueNames should be before the lock is acquired.

Thanks @tvernum. Your comment is very insightful. You are right we could minimize the locking time in this case. The possibility of invalidating more than necessary should be very low. The getPrivileges thread needs to read the incremented value of numInvalidation and perform a search query and all these have to complete before the cache is invalidated. The chance should be extremely low. I have updated the code to minimize the locking for both invalidate and invalidateAll.

A more possible scenario is "unnecessarily skipping put result in cache". We only cache when numInvalidation does not change. But this value changes for both partially and full cache invalidation. In the case of partial invalidation, the things get invalidated may not be relevant to the things that we want to cache. But the code would just skip caching them regardlessly for simplicity. This however has nothing to do with the locking, i.e. the same situation exists before we added the locking. Overall, combined with how Kibana behaves, I think this is an acceptable trade-off because:

The chance is still low

We always fully invalidate applicationNamesCache even for partially invalidation. And we decide to keep it this way for simplicity.

Also moved descriptorsCache != null check before the read lock, so we do not just lock and find out there is no cache to use. It is an edge case optimization but anyway it is easy to add.

…ege-cache

Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics. Co-authored-by: Russ Cam <[email protected]>

ywangd added 9 commits April 27, 2020 15:19

WIP

b72d0d3

Add tests

a02aba3

Add invalidation

6d9fa0a

Add some corrurrency liveness protection

ee33fff

Add API for both rest and transport

4bbd16d

Simplify NativePrivilegeStore

df28676

Minor

4a63a16

More tweak

a1d177c

Add single node tests

05061f9

Rest tests Documents Other tweaks and optimizations

ywangd added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.8.0 labels Apr 28, 2020

ywangd requested a review from tvernum April 28, 2020 03:16

ywangd commented Apr 28, 2020

View reviewed changes

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java Outdated Show resolved Hide resolved

ywangd commented Apr 28, 2020

View reviewed changes

ywangd added 5 commits April 28, 2020 13:29

checkstyle

daafb80

More checkstyle

545e27c

checkstyle again

5987b3b

Fix json API file

6094178

Update API json file

1c58309

tvernum reviewed Apr 29, 2020

View reviewed changes

ywangd and others added 5 commits April 30, 2020 12:39

WIP

9100c24

Update x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/s…

3f9c5b7

…ecurity/authz/store/NativePrivilegeStore.java Co-Authored-By: Tim Vernum <[email protected]>

Merge branch 'es-54317-app-privilege-cache' of github.com:ywangd/elas…

08865dd

…ticsearch into es-54317-app-privilege-cache

Update x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/s…

b9538a8

…ecurity/action/privilege/TransportClearPrivilegesCacheAction.java Co-Authored-By: Tim Vernum <[email protected]>

Merge branch 'es-54317-app-privilege-cache' of github.com:ywangd/elas…

9c5e947

…ticsearch into es-54317-app-privilege-cache

ywangd requested a review from tvernum June 2, 2020 14:02

ywangd added 2 commits June 3, 2020 23:10

Address feedback for readwritelock and cache invalidation

1499a26

Merge remote-tracking branch 'origin/master' into es-54317-app-privil…

3b7e773

…ege-cache

ywangd added 4 commits June 4, 2020 18:47

add tests for caching behaviour

83a0e19

Merge remote-tracking branch 'origin/master' into es-54317-app-privil…

3deef1e

…ege-cache

Add TTL as a safety net. Also consolidate cache size to a single setting

9364e36

Merge remote-tracking branch 'origin/master' into es-54317-app-privil…

0280c3e

…ege-cache

tvernum approved these changes Jun 24, 2020

View reviewed changes

ywangd added 2 commits June 29, 2020 12:35

Address feedback about lock duration

ae3c784

Merge remote-tracking branch 'origin/master' into es-54317-app-privil…

68a4dc1

…ege-cache

tvernum approved these changes Jun 29, 2020

View reviewed changes

ywangd merged commit 38185e5 into elastic:master Jun 29, 2020

ywangd added the backport pending label Jun 29, 2020

ywangd mentioned this pull request Jul 13, 2020

Cache API key doc to reduce traffic to the security index #59376

Merged

russcam mentioned this pull request Jul 23, 2020

7.9.0 Meta ticket elastic/elasticsearch-net#4872

Closed

29 tasks

russcam added a commit to elastic/elasticsearch-net that referenced this pull request Aug 4, 2020

Add security.clear_cached_privileges API

76f86f6

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.

russcam mentioned this pull request Aug 4, 2020

Add security.clear_cached_privileges API elastic/elasticsearch-net#4916

Merged

russcam added a commit to elastic/elasticsearch-net that referenced this pull request Aug 5, 2020

Add security.clear_cached_privileges API (#4916)

6899797

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.

github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Aug 5, 2020

Add security.clear_cached_privileges API (#4916)

a252dc4

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.

github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Aug 5, 2020

Add security.clear_cached_privileges API (#4916)

3a0a029

Relates: elastic/elasticsearch#55836 Derive ClearCachedRealmsResponse from NodesResponseBase to expose NodeStatistics.

ywangd removed the backport pending label Apr 18, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache for application privileges #55836

Add cache for application privileges #55836

ywangd commented Apr 28, 2020 •

edited

Loading

elasticmachine commented Apr 28, 2020

ywangd Apr 28, 2020

tvernum May 28, 2020

ywangd Apr 28, 2020

tvernum May 28, 2020

ywangd May 29, 2020

tvernum Jun 1, 2020

tvernum Apr 29, 2020

ywangd Apr 30, 2020

ywangd Apr 30, 2020

tvernum Apr 29, 2020

ywangd Apr 30, 2020

tvernum Apr 29, 2020

ywangd May 4, 2020

tvernum May 28, 2020

ywangd commented Apr 30, 2020

ywangd commented Jun 2, 2020

ywangd commented Jun 3, 2020

ywangd commented Jun 15, 2020 •

edited

Loading

tvernum left a comment

tvernum Jun 24, 2020

ywangd Jun 29, 2020

		applicationNamesCache.invalidateAll();
		final Set<String> uniqueNames = Set.copyOf(updatedApplicationNames);

Add cache for application privileges #55836

Add cache for application privileges #55836

Conversation

ywangd commented Apr 28, 2020 • edited Loading

elasticmachine commented Apr 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd commented Apr 30, 2020

ywangd commented Jun 2, 2020

ywangd commented Jun 3, 2020

ywangd commented Jun 15, 2020 • edited Loading

tvernum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd commented Apr 28, 2020 •

edited

Loading

ywangd commented Jun 15, 2020 •

edited

Loading