[ML] Jindex: Prefer index config documents to cluster state config #35940

davidkyle · 2018-11-27T10:15:53Z

During the migration of ml configs from the clusterstate to index documents there is a window where the config is extant in both clusterstate and index. In this case prefer the index so all updates must be made to the index doc. This also helps in the case where there is a failure removing config from the clusterstate after copying to the index.

elasticmachine · 2018-11-27T10:15:55Z

Pinging @elastic/ml-core

benwtrent · 2018-11-27T14:00:04Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedConfigReader.java

+                        if (isDuplicate == false) {
+                            datafeedConfigs.add(clusterStateConfigs.get(clusterStateDatafeedId));
+                        }
+                    }


One thing is sort of nagging me with this code:

It seems to me that if we are checking Collection Membership all the time, that it should be a Set, though these collections may be so small, that this sort of optimization is not necessary and may result in worse performance. So, this is a suggestion you are free to ignore if these are adequately small collections.

Set<String> indexConfigIds = datafeedConfigs.stream().map(DatafeedConfig::getId).collect(Collectors.toSet()); ... if(indexConfigIds.contains(clusterStateDatafeedId) == false)

I agonised over this for ages with the same nagging feeling. The code looks too complicated but I was erring on the side that the collection would be small making any optimisation premature. I'll make your change as it's more readable

:D, glad to see we agonized over the same thing

benwtrent · 2018-11-27T14:02:15Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/JobManager.java

+                    // Prefer the index configs and filter duplicates.
+                    Map<String, Job> clusterStateJobs = expandJobsFromClusterState(expression, clusterService.state());
+                    for (String clusterStateJobId : clusterStateJobs.keySet()) {
+                        boolean isDuplicate = jobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId));


Since jobAndGroupIds have groupIds and jobIds in a Set already, why can't we use it to check if we have that job in the index ?

benwtrent · 2018-11-27T14:21:23Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/JobManager.java

+                            ClusterStateJobUpdate.deleteJob(request, clusterService, listener);
+                        }
+                    } else {
+                        listener.onResponse(deleteResponse.getResult() == DocWriteResponse.Result.DELETED);


This is interesting, I understand the change as now we are deleting a document and not just something from the cluster state. However, looking at how our other deletions work that are deleting indexes, they just always return true.

Are we agreeing now that we should modify the AcknowledgedResponse based on the success of the deletion?

Right now, the deleteDatafeed method asserts that it is deleted, I guess that throws an exception if the doc is not deleted, where as this simply changes the true to a false in the AcknowledgedResponse. I think these methods should be consistent.

elasticsearch/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/persistence/DatafeedConfigProvider.java

Lines 232 to 233 in a7c78ee

assert deleteResponse.getResult() == DocWriteResponse.Result.DELETED;

actionListener.onResponse(deleteResponse);

JobConfigProvider#deleteJob() takes an errorIfMissing parameter but DatafeedConfigProvider#deleteDatafeedConfig() does not. There are inconsistencies here and the value returned by the listener isn't actually checked. I'll try to clean it up in this PR but if it becomes a big change it will be better to raise a separate PR - one I can easily port to the master feature branch as this PR will not be merged into master (by v7 all jobs will run from index configs).

Assertions are enabled during testing java -ea ... but most people will not enable them in production, here they serve as a sanity check because the document should either be not found or deleted anything else is unexpected.

Looking at the code there are refactorings to be made, some functions take a errorIfMissing parameter but are only called with it set to false, also there are some functions that are not called anymore (JobConfigProvider.getJobs). I'll make those changes in a separate PR

davidkyle · 2018-11-27T16:53:57Z

The rolling upgrade tests were failing when deleting a job configuration as this change now tries the index first. .ml-config is created from a template when something is written to it, if it does not exist a delete operation errors with IndexNotFoundException the same thing happens with get requests. I've fixed this with the same approach I used on get which is to catch IndexNotFoundException and convert it to ResourceNotFoundException. The same change was required for updating a job or datafeed as those also do a get before putting the update.

benwtrent · 2018-11-27T17:12:33Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/JobManager.java

+                        // this may occur during migration of configs.
+                        // Filter the duplicates so we don't update twice for duplicated jobs
+                        for (String clusterStateJobId : clusterStateJobs.keySet()) {
+                            boolean isDuplicate = allJobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId));


similar concern as before with the usage of a set vs iterating over a list everytime, but no biggie.

Flipping the change in #35940

Prefer index config documents to cluster state config

417624f

davidkyle added >feature :ml Machine learning labels Nov 27, 2018

Fix delete job

a7c78ee

benwtrent reviewed Nov 27, 2018

View reviewed changes

davidkyle added 2 commits November 27, 2018 16:34

Simplify detecting duplicate configs

833e1e6

Handle index not found errors

925fae4

davidkyle mentioned this pull request Nov 27, 2018

[ML] Metadata Migration Meta issue #32905

Closed

43 tasks

benwtrent approved these changes Nov 27, 2018

View reviewed changes

davidkyle merged commit a3ce149 into elastic:feature-jindex-6x Nov 28, 2018

davidkyle deleted the prefer-index branch November 28, 2018 09:19

davidkyle mentioned this pull request Nov 28, 2018

[ML] Prefer cluster state config to index documents #36014

Merged

davidkyle added a commit that referenced this pull request Nov 29, 2018

[ML] Prefer cluster state config to index documents (#36014)

4e98158

Flipping the change in #35940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Jindex: Prefer index config documents to cluster state config #35940

[ML] Jindex: Prefer index config documents to cluster state config #35940

davidkyle commented Nov 27, 2018

elasticmachine commented Nov 27, 2018

benwtrent Nov 27, 2018 •

edited

Loading

davidkyle Nov 27, 2018

benwtrent Nov 27, 2018

benwtrent Nov 27, 2018

benwtrent Nov 27, 2018

davidkyle Nov 27, 2018

benwtrent Nov 27, 2018

davidkyle Nov 27, 2018

davidkyle commented Nov 27, 2018

benwtrent Nov 27, 2018

	assert deleteResponse.getResult() == DocWriteResponse.Result.DELETED;
	actionListener.onResponse(deleteResponse);

[ML] Jindex: Prefer index config documents to cluster state config #35940

[ML] Jindex: Prefer index config documents to cluster state config #35940

Conversation

davidkyle commented Nov 27, 2018

elasticmachine commented Nov 27, 2018

benwtrent Nov 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle commented Nov 27, 2018

Choose a reason for hiding this comment

benwtrent Nov 27, 2018 •

edited

Loading