-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Jindex: Prefer index config documents to cluster state config #35940
Conversation
Pinging @elastic/ml-core |
if (isDuplicate == false) { | ||
datafeedConfigs.add(clusterStateConfigs.get(clusterStateDatafeedId)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing is sort of nagging me with this code:
It seems to me that if we are checking Collection Membership all the time, that it should be a Set
, though these collections may be so small, that this sort of optimization is not necessary and may result in worse performance. So, this is a suggestion you are free to ignore if these are adequately small collections.
Set<String> indexConfigIds = datafeedConfigs.stream().map(DatafeedConfig::getId).collect(Collectors.toSet());
...
if(indexConfigIds.contains(clusterStateDatafeedId) == false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agonised over this for ages with the same nagging feeling. The code looks too complicated but I was erring on the side that the collection would be small making any optimisation premature. I'll make your change as it's more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D, glad to see we agonized over the same thing
// Prefer the index configs and filter duplicates. | ||
Map<String, Job> clusterStateJobs = expandJobsFromClusterState(expression, clusterService.state()); | ||
for (String clusterStateJobId : clusterStateJobs.keySet()) { | ||
boolean isDuplicate = jobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since jobAndGroupIds
have groupIds and jobIds in a Set
already, why can't we use it to check if we have that job in the index
?
ClusterStateJobUpdate.deleteJob(request, clusterService, listener); | ||
} | ||
} else { | ||
listener.onResponse(deleteResponse.getResult() == DocWriteResponse.Result.DELETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, I understand the change as now we are deleting a document and not just something from the cluster state. However, looking at how our other deletions work that are deleting indexes, they just always return true.
Are we agreeing now that we should modify the AcknowledgedResponse
based on the success of the deletion?
Right now, the deleteDatafeed
method asserts that it is deleted, I guess that throws an exception if the doc is not deleted, where as this simply changes the true
to a false
in the AcknowledgedResponse
. I think these methods should be consistent.
Lines 232 to 233 in a7c78ee
assert deleteResponse.getResult() == DocWriteResponse.Result.DELETED; | |
actionListener.onResponse(deleteResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JobConfigProvider#deleteJob()
takes an errorIfMissing
parameter but DatafeedConfigProvider#deleteDatafeedConfig()
does not. There are inconsistencies here and the value returned by the listener isn't actually checked. I'll try to clean it up in this PR but if it becomes a big change it will be better to raise a separate PR - one I can easily port to the master feature branch as this PR will not be merged into master (by v7 all jobs will run from index configs).
Assertions are enabled during testing java -ea ...
but most people will not enable them in production, here they serve as a sanity check because the document should either be not found or deleted anything else is unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, roger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code there are refactorings to be made, some functions take a errorIfMissing
parameter but are only called with it set to false, also there are some functions that are not called anymore (JobConfigProvider.getJobs
). I'll make those changes in a separate PR
The rolling upgrade tests were failing when deleting a job configuration as this change now tries the index first. |
// this may occur during migration of configs. | ||
// Filter the duplicates so we don't update twice for duplicated jobs | ||
for (String clusterStateJobId : clusterStateJobs.keySet()) { | ||
boolean isDuplicate = allJobs.stream().anyMatch(job -> job.getId().equals(clusterStateJobId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar concern as before with the usage of a set vs iterating over a list everytime, but no biggie.
Flipping the change in #35940
During the migration of ml configs from the clusterstate to index documents there is a window where the config is extant in both clusterstate and index. In this case prefer the index so all updates must be made to the index doc. This also helps in the case where there is a failure removing config from the clusterstate after copying to the index.