Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Jindex: Prefer index config documents to cluster state config #35940

Merged
merged 4 commits into from
Nov 28, 2018

Conversation

davidkyle
Copy link
Member

During the migration of ml configs from the clusterstate to index documents there is a window where the config is extant in both clusterstate and index. In this case prefer the index so all updates must be made to the index doc. This also helps in the case where there is a failure removing config from the clusterstate after copying to the index.

@davidkyle davidkyle added >feature :ml Machine learning labels Nov 27, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

if (isDuplicate == false) {
datafeedConfigs.add(clusterStateConfigs.get(clusterStateDatafeedId));
}
}
Copy link
Member

@benwtrent benwtrent Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing is sort of nagging me with this code:

It seems to me that if we are checking Collection Membership all the time, that it should be a Set, though these collections may be so small, that this sort of optimization is not necessary and may result in worse performance. So, this is a suggestion you are free to ignore if these are adequately small collections.

Set<String> indexConfigIds = datafeedConfigs.stream().map(DatafeedConfig::getId).collect(Collectors.toSet());
...
if(indexConfigIds.contains(clusterStateDatafeedId) == false)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agonised over this for ages with the same nagging feeling. The code looks too complicated but I was erring on the side that the collection would be small making any optimisation premature. I'll make your change as it's more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D, glad to see we agonized over the same thing

// Prefer the index configs and filter duplicates.
Map<String, Job> clusterStateJobs = expandJobsFromClusterState(expression, clusterService.state());
for (String clusterStateJobId : clusterStateJobs.keySet()) {
boolean isDuplicate = jobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since jobAndGroupIds have groupIds and jobIds in a Set already, why can't we use it to check if we have that job in the index ?

ClusterStateJobUpdate.deleteJob(request, clusterService, listener);
}
} else {
listener.onResponse(deleteResponse.getResult() == DocWriteResponse.Result.DELETED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, I understand the change as now we are deleting a document and not just something from the cluster state. However, looking at how our other deletions work that are deleting indexes, they just always return true.

Are we agreeing now that we should modify the AcknowledgedResponse based on the success of the deletion?

Right now, the deleteDatafeed method asserts that it is deleted, I guess that throws an exception if the doc is not deleted, where as this simply changes the true to a false in the AcknowledgedResponse. I think these methods should be consistent.

assert deleteResponse.getResult() == DocWriteResponse.Result.DELETED;
actionListener.onResponse(deleteResponse);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JobConfigProvider#deleteJob() takes an errorIfMissing parameter but DatafeedConfigProvider#deleteDatafeedConfig() does not. There are inconsistencies here and the value returned by the listener isn't actually checked. I'll try to clean it up in this PR but if it becomes a big change it will be better to raise a separate PR - one I can easily port to the master feature branch as this PR will not be merged into master (by v7 all jobs will run from index configs).

Assertions are enabled during testing java -ea ... but most people will not enable them in production, here they serve as a sanity check because the document should either be not found or deleted anything else is unexpected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, roger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code there are refactorings to be made, some functions take a errorIfMissing parameter but are only called with it set to false, also there are some functions that are not called anymore (JobConfigProvider.getJobs). I'll make those changes in a separate PR

@davidkyle
Copy link
Member Author

The rolling upgrade tests were failing when deleting a job configuration as this change now tries the index first. .ml-config is created from a template when something is written to it, if it does not exist a delete operation errors with IndexNotFoundException the same thing happens with get requests. I've fixed this with the same approach I used on get which is to catch IndexNotFoundException and convert it to ResourceNotFoundException. The same change was required for updating a job or datafeed as those also do a get before putting the update.

@davidkyle davidkyle mentioned this pull request Nov 27, 2018
43 tasks
// this may occur during migration of configs.
// Filter the duplicates so we don't update twice for duplicated jobs
for (String clusterStateJobId : clusterStateJobs.keySet()) {
boolean isDuplicate = allJobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar concern as before with the usage of a set vs iterating over a list everytime, but no biggie.

@davidkyle davidkyle merged commit a3ce149 into elastic:feature-jindex-6x Nov 28, 2018
@davidkyle davidkyle deleted the prefer-index branch November 28, 2018 09:19
davidkyle added a commit that referenced this pull request Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants