-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Detect timeout when waiting for download task #103197
Conversation
Pinging @elastic/ml-core (Team:ML) |
# Conflicts: # x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/TaskRetriever.java
run elasticsearch-ci/part-4 |
2 similar comments
run elasticsearch-ci/part-4 |
run elasticsearch-ci/part-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...n/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/TaskRetriever.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
A list tasks timeout indicates the task exists and is in progress. Interpreting the timeout as the task not existing meant the download check would incorrectly assume the download had completed. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/3rd_party_deployment.yml
A list tasks timeout indicates the task exists and is in progress. Interpreting the timeout as the task not existing meant the download check would incorrectly assume the download had completed. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/3rd_party_deployment.yml
When starting a model deployment there is a check the a download task is in progress for that model id. This check is called with
wait_for_completion: true
and a timeout of 30 seconds so that the start might wait for the download to finish.The problem is that if the download task is present but does not complete in 30 seconds it would appear there was no task.
ListTasksResponse
is aBaseTasksResponse
, for these types of responses the node and task exceptions should be checked. When the request times out waiting for a task to complete there are no tasks in the response but there is a node exception with the timeout. The change here is to check for that timeout and return a descriptive error message.The failing test from #103153 is now called with a 1 second timeout as there is no point waiting 30 seconds for the test to fail. The intermittent failures were occurred when the model download did not complete in the default timeout period of 30s so the full model definition was not present.
Closes #103153