Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory CB bugs and upgrade UTs to compatible with core changes #2469

Merged
merged 2 commits into from
May 23, 2024

Conversation

Zhangxunmt
Copy link
Collaborator

@Zhangxunmt Zhangxunmt commented May 22, 2024

Description

This PR fixed these problems:

  1. fix bug - the circuit breaker checkups are always skipped in single node cluster when Predicting a request. This PR fixes this issue ensuring the single node prediction has the same CB checkups before predicting.
  2. Updated the log message to show the right content when CB is open. [BUG] Wrong error message when memory circuit breaker is open #2465
  3. Disable the memory CB when the threshold is 100. [BUG] Multiple calls of model deploy API causes exception from Memory Circuit Breaker #2308
  4. Added ClusterApplierService in the UTs to be compatible with latest opensearch core changes.
  5. Skip the Circuit breaker checks for Remote Models in Register/Deploy/Predict.

Verified in both single-node and multi-node clusters. UT/ITs added to cover all cases.

Issues Resolved

#2308
#2465

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 22, 2024 22:43 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 22, 2024 22:43 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 22, 2024 22:43 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 22, 2024 22:51 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 22, 2024 22:59 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt had a problem deploying to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Failure
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 20:21 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 21:15 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 21:15 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env May 23, 2024 21:15 — with GitHub Actions Inactive
@Zhangxunmt Zhangxunmt merged commit f88b6d6 into opensearch-project:main May 23, 2024
12 of 13 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 23, 2024
…2469)

* fix memory CB bugs

Signed-off-by: Xun Zhang <[email protected]>

* change CB limit exception code to 429 and skip CB check for remote models

Signed-off-by: Xun Zhang <[email protected]>

---------

Signed-off-by: Xun Zhang <[email protected]>
(cherry picked from commit f88b6d6)
Long value = (Long) mlStats.getStat(MLNodeLevelStat.ML_CIRCUIT_BREAKER_TRIGGER_COUNT).getValue();
assertEquals(1L, value.longValue());
assertEquals(0L, value.longValue());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of the test is testRun_CircuitBreakerOpen() and now we are asserting there's no circuit breaker open. May be we can refactor the name at least?

Zhangxunmt added a commit that referenced this pull request May 23, 2024
…2469) (#2472)

* fix memory CB bugs

Signed-off-by: Xun Zhang <[email protected]>

* change CB limit exception code to 429 and skip CB check for remote models

Signed-off-by: Xun Zhang <[email protected]>

---------

Signed-off-by: Xun Zhang <[email protected]>
(cherry picked from commit f88b6d6)

Co-authored-by: Xun Zhang <[email protected]>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.11 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.11 2.11
# Navigate to the new working tree
cd .worktrees/backport-2.11
# Create a new branch
git switch --create backport/backport-2469-to-2.11
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f88b6d60730afb71f3dce6d3fb65d5f5b085e7bb
# Push it to GitHub
git push --set-upstream origin backport/backport-2469-to-2.11
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.11

Then, create a pull request where the base branch is 2.11 and the compare/head branch is backport/backport-2469-to-2.11.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.9 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.9 2.9
# Navigate to the new working tree
cd .worktrees/backport-2.9
# Create a new branch
git switch --create backport/backport-2469-to-2.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f88b6d60730afb71f3dce6d3fb65d5f5b085e7bb
# Push it to GitHub
git push --set-upstream origin backport/backport-2469-to-2.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.9

Then, create a pull request where the base branch is 2.9 and the compare/head branch is backport/backport-2469-to-2.9.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.13 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.13 2.13
# Navigate to the new working tree
cd .worktrees/backport-2.13
# Create a new branch
git switch --create backport/backport-2469-to-2.13
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f88b6d60730afb71f3dce6d3fb65d5f5b085e7bb
# Push it to GitHub
git push --set-upstream origin backport/backport-2469-to-2.13
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.13

Then, create a pull request where the base branch is 2.13 and the compare/head branch is backport/backport-2469-to-2.13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants