-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register privileges in Kibana Platform Security plugin and remove legacy getUser
API.
#65472
Conversation
Pinging @elastic/kibana-security (Team:Security) |
d4c6f7b
to
e44e540
Compare
e44e540
to
44b39b3
Compare
44b39b3
to
baa580c
Compare
fetchedSpaces => setSpaces({ enabled: true, list: fetchedSpaces }), | ||
(err: IHttpFetchError) => { | ||
// Spaces plugin can be disabled and hence this endpoint can be unavailable. | ||
if (err.response?.status === 404) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I moved this change to a separate commit as I want to hear your thoughts. It's "the least evil" solution I could come up with. Alternatively we can introduce an endpoint that will return isSpacesEnabled
only based on the "Spaces Service" that Spaces registers with Security during setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is tolerable. I think it'd make sense (in a followup) for us to export a useSpaces
hook from the Spaces plugin itself, so that we could at least have the Spaces plugin own this determination.
The disabled check aside, I could see other plugins benefiting from this hook too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree
validateFeaturePrivileges(allFeatures); | ||
validateReservedPrivileges(allFeatures); | ||
|
||
await registerPrivilegesWithCluster( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a gap in error handling here. The legacy implementation had retry logic (with backoff) in the event of a transient failure, which I think we want here as well. If await registerPrivilegesWithCluster()
throws an error, but neither this.status.core$
nor this.license.features$
emit a new value, then Kibana will be stuck in an unusable state from an authorization perspective. We will either have to wait for one of those two observables to change (if that ever happens), or Kibana will have to be restarted in order to reattempt privilege registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates, but it seems not be the case since you're bringing this up 🙂 I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates
I think it will definitely catch some of these, but I could also imagine a race condition where core thinks everything is OK, but then we get network blip where we lose our connection for a very short period of time. core
might not catch this blip if it wasn't doing anything with ES at the time.
It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)
I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...
Yeah, that's definitely a tradeoff. That's probably one of the reasons we originally had this as a reusable utility, so that each plugin wouldn't have to implement its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)
The thing that concerns me most is that Core determines the first time when we try to perform an action, but if that action fails, then responsibility shifts to a consumer that now decides when and if it's appropriate to do a re-try. It seems we don't have a way to force Core to check whether ES is available or not at the time we do a re-try. If ES is struggling for whatever reason we (every plugin that needs to do a re-try) don't want to make things worse via bombarding it with the requests. Ideally Core should be a single source of truth that decides whether it's appropriate to do a re-try or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. @elastic/kibana-platform do you have any thoughts around this?
To summarize, we are waiting for Core to tell us when ES is ready via the exposed CoreStatus
service. Once all prerequisites are met, we attempt to make a couple of calls to ES. If these checks fail though, we would like to be able to retry them at some point in the future, but it's possible that everything appears "fine" from the perspective of CoreStatus
, so we will never get an updated value from the Observable.
Having each plugin implement its own retry logic is feeling a bit complex and error prone. Aleh also makes a good point that plugins shouldn't be exacerbating the problem by retrying whenever they feel it is appropriate, but instead should wait on a signal from Core. The problem is that we don't have a way to alert Core that something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@legrego I don't remember we discussed this case. Added a comment to the status service meta-issue #41983 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SASS change is fine. Thanks for adding the Sass linting!
let retryTimeout: NodeJS.Timeout; | ||
|
||
// Register cluster privileges once Elasticsearch is available and Security plugin is enabled. | ||
this.statusSubscription = combineLatest([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: presumably we'll get rid of this code in 7.10, so I gave up on learning how to properly leverage retryWhen
here and went with plain setTimeout
approach 🙈
@legrego should be ready for another pass, thanks! |
Pinging @elastic/apm-ui (Team:apm) |
Pinging @elastic/uptime (Team:uptime) |
Pinging @elastic/ingest-management (Team:Ingest Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alerting changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apm changes lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note I'm running a flaky test suite for the api_integration
ciGroup to make sure this is stable. I worry that we might not be immediately notified of the license change, or that the privilege registration will take longer than the subsequent API request to get the next set of privileges
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/512/
Once the flaky runner completes, this LGTM! Thanks for the edits!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ingest manager README changes LGTM
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
Thanks! All 42 runs of flaky test suite passed, but I'll keep eye on this. In case that start failing I'll just convert final |
…acy `getUser` API. (elastic#65472) # Conflicts: # x-pack/README.md # x-pack/plugins/uptime/README.md # x-pack/scripts/functional_tests.js
7.x/7.9.0: 2890d5b |
Summary
In this PR we're:
xpack.spaces.enabled
)Note to reviewers: it's a bit easier to review commit by commit (git move + changes).
If you received a review ping: main X-Pack API integration test config was migrated to TS and hence required fix in all references.
Blocked by: #65461