Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in CheckpointReadWorker #2

Closed
wants to merge 2 commits into from

Conversation

kaituo
Copy link
Owner

@kaituo kaituo commented Jun 21, 2021

Description

This PR fixes bugs in CheckpointReadWorker.

1)remove unnecessary training in the processCheckpointIteration method as the previous call of processEntityCheckpoint has already done the same thing.
2)In MultiGetResponse, not found doc is not an exception but a boolean value. Move the logic of dealing with not found doc out of the exception block and guard them by GetResponse.isExists() instead.
3)deal with the overloaded cluster issue in each GetResponse.

Testing done:

  1. added unit tests for these bugs.

Note: this PR has some changes related to #3. Since I cannot get a correct diff, I sent separate PRs.

Issues Resolved

opensearch-project#85

Check List

  • [ X ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This PR fixes bugs in CheckpointReadWorker.

1)remove unnecessary training in the processCheckpointIteration method as the previous call of processEntityCheckpoint has already done the same thing.
2)In MultiGetResponse, not found doc is not an exception but a boolean value. Move the logic of dealing with not found doc out of the exception block and guard them by GetResponse.isExists() instead.
3)deal with the overloaded cluster issue in each GetResponse.

Testing done:
1. added unit tests for these bugs.
@kaituo kaituo requested review from ohltyler and jmazanec15 June 21, 2021 20:09
Copy link
Collaborator

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

} else if (ExceptionUtil.isRetryAble(failure)) {
if (retryableRequests == null) {
retryableRequests = new HashSet<>();
}
retryableRequests.add(modelId);
} else if (ExceptionUtil.isOverloaded(failure)) {
LOG.error("too many get AD model checkpoint requests or shard not available");
setCoolDownStart();
} else {
LOG.info("Unexpected failure", failure);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For another PR: Should this be logged as an error?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fixed

@ohltyler ohltyler mentioned this pull request Jun 23, 2021
@kaituo kaituo closed this Jun 23, 2021
kaituo added a commit to opensearch-project/anomaly-detection that referenced this pull request Jul 12, 2021
This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
ohltyler pushed a commit to opensearch-project/anomaly-detection that referenced this pull request Sep 1, 2021
This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants