-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] job saved objects initialization #82639
[ML] job saved objects initialization #82639
Conversation
// const { body } = await client.asInternalUser.count({ | ||
// index: '.ml-config', | ||
// }); | ||
// return body.count > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the original PR:
@droberts195 @legrego
This function needs to check that there is at least one ML job in existence.
I believe the fastest way to do this would be to perform a count on .ml-config. However, the kibana system user does not have read access to that index.
Instead i'm having to call both AD and DFA get endpoints to perform the same check (below).
Do you think it would be safe to give the kibana system user the ability to make this check?
@legrego wrote:
I'm not terribly familiar with the contents of .ml-config
. It appears to be a listing of jobs? Is there anything else in that index?
A more direct question might be: Is there any information in that index that's not accessible via the ML APIs granted to users with the manage_ml
cluster privilege? I'd be ok granting read
access to the .ml-config
index if this is true, since kibana_system
already has that cluster privilege.
On the other hand, if there is information in this index that the kibana_system
user currently doesn't have access to via other APIs, then I'd need a better understanding of that data before answering one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 what are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this PR is holding up a lot of other work, i'm going to leave question open for a future PR.
I'll leave in this commented code in case the decision is to use count.
Pinging @elastic/ml-ui (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ⚡
} | ||
return results; | ||
} | ||
|
||
async function _loadAllJobSavedObjects() { | ||
const { body } = await client.asInternalUser.search<SearchResponse<SavedObjectJob>>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked this in the first PR. We need to be using the saved objects client to retrieve all jobs, instead of the raw ES client to query the index.
The way this is currently written, it won't support users who have changed their kibana index, and it may return results from indices that match this pattern that aren't actually the "kibana index".
Using the Saved Objects Client will also ensure that the Kibana index has been fully initialized, and is ready for plugins to access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a search here because I need a list of all saved objects across all spaces, even the ones the user doesn't have access to. As far as i'm aware the saved object client can't do that?
checkStatus
needs to know if a job has a saved object, otherwise we'll end up with duplicates when repairing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@legrego i've updated these searches so it now uses the saved objects client
|
||
async function _jobSavedObjectsExist(size: number = 1) { | ||
const { body } = await client.asInternalUser.search({ | ||
index: '.kibana*', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier: we should be using the Saved Objects client for this operation
// check whether the job saved objects exist | ||
// and create them if needed. | ||
const { initializeJobs } = jobSavedObjectsInitializationFactory(coreStart); | ||
initializeJobs().finally(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve visibility into the status of this operation, we could hook into Core's status service (exposed on CoreSetup
). This would allow the platform, ML consumers, and end users to understand why jobs may be missing in the event of a failure here.
Don't consider this a blocker for merging, but I'd recommend at least exploring this in a followup. If nothing else, it will help us triage failures with the inevitable SDH comes in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inevitable" excuse me?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, that wasn't at all a knock on your work here, sorry if it came across that way. Coordinating something like this with a distributed system is bound to have some growing pains. We continue to have them on our team with respect to the authorization system we've built, despite our best efforts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know, sorry! i just thought that was funny.
await initSavedObjects(); | ||
mlLog.info('Job saved objects initialized for * space'); | ||
} catch (error) { | ||
mlLog.error('Error Initializing jobs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to log the error that we caught to help triage failures in production. Otherwise we may never know why initialization failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in d2800ab
} | ||
|
||
async function _needsInitializing() { | ||
if (await _jobSavedObjectsExist()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be adding a "repair" option to the management UI in a followup? I worry that a partially initialized set of jobs would confuse end users, and restarting the Kibana server wouldn't help them since we won't attempt initialization if a single SO exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, repairing the saved objects from the management UI will be coming very soon, for 7.11
); | ||
const { initSavedObjects } = repairFactory(client, jobSavedObjectService); | ||
await initSavedObjects(); | ||
mlLog.info('Job saved objects initialized for * space'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How easy is it to get the count of initialized jobs? That might be nice to record in the log message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in a959ee7
@elasticmachine merge upstream |
if (await _jobsExist()) { | ||
// some ml jobs exist, we need to create those saved objects | ||
return true; | ||
} | ||
|
||
// no ml jobs actually exist, | ||
// that's why there were no saved objects | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (await _jobsExist()) { | |
// some ml jobs exist, we need to create those saved objects | |
return true; | |
} | |
// no ml jobs actually exist, | |
// that's why there were no saved objects | |
return false; | |
return await _jobsExist(); |
@@ -388,5 +191,5 @@ export function checksFactory( | |||
}); | |||
} | |||
|
|||
return { checkStatus, repairJobs, initSavedObjects }; | |||
return { checkStatus }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we name it getCurrentStatus
instead?
) { | ||
const { checkStatus } = checksFactory(client, jobSavedObjectService); | ||
|
||
async function repairJobs(simulate: boolean = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about having thesimulate
check on top and return the default object instead of checking in teach for loop?
const tasks: Array<() => Promise<void>> = []; | ||
|
||
const status = await checkStatus(); | ||
for (const job of status.jobs['anomaly-detector']) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as there is no break
statements or async
execution, what about using .filter
and .forEach
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! One last comment. When spaces is disabled, the management screen returns a 400
/bad request when querying for jobs. For the time being, you'll unfortunately need to strip out namespaces: ['*']
from the find
request when spaces is disabled:
note your internal checks for initialization/repair can optionally keep namespaces: ['*']
in place if that makes your life easier, but this user-facing call will have to be smart enough to exclude it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the understanding that we'll be adding comprehensive test coverage in a followup prior to release. Nice work!
💚 Build SucceededMetrics [docs]Module Count
Async chunks
Distributable file count
History
To update your PR or re-run it, just comment with: |
* [ML] job saved objects initialization * fixing job count logic * adding missing files * attempting to fix build crash * fixing kibana.json * changes based on review * removing accidentally added export * adding intialization promise * use finally so errors dont stop initialization * function rename * removing duplicate header * adding job initialization count to log message * adding error to log message * moving initialization file * moving intialization file back again to fix git stash issues * removing .kibana index search * creating internal saved object client * code clean up * removing commented code * adding check for spaces enabled * adding ids to saved objects Co-authored-by: Kibana Machine <[email protected]>
* [ML] job saved objects initialization * fixing job count logic * adding missing files * attempting to fix build crash * fixing kibana.json * changes based on review * removing accidentally added export * adding intialization promise * use finally so errors dont stop initialization * function rename * removing duplicate header * adding job initialization count to log message * adding error to log message * moving initialization file * moving intialization file back again to fix git stash issues * removing .kibana index search * creating internal saved object client * code clean up * removing commented code * adding check for spaces enabled * adding ids to saved objects Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Adds a check during the plugin's start to see if any
ml-job
saved objects exist. If they don't, and ML jobs do exist, it will run an initialization process to create them.Also moves the
mlLog
service to a better location.Remake of #82450