Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel in-flight search tasks due to search backpressure with 429 sta… #6634

Conversation

PritLadani
Copy link
Contributor

@PritLadani PritLadani commented Mar 11, 2023

Description

  • Added support for cancelling tasks with arbitrary status codes
  • Modified search backpressure service to cancel tasks with 429 status code

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 262 to 264
if (cause.getCause() instanceof TaskCancelledException
&& ((TaskCancelledException) cause.getCause()).status() == RestStatus.TOO_MANY_REQUESTS) {
return ((TaskCancelledException) cause.getCause()).status();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid TaskCancelledException in OpenSearchException ? This looks like an anti pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternate here is to make it generic and throw the underlying cause's status in all cases till level 2 . We need to think about whether it is the right behavior and also make sure it doesn't cause any regression in existing code (ITs would make sure of that). Something on the below lines.

    public RestStatus status() {
        Throwable cause = unwrapCause();
        if (cause == this) {
            return RestStatus.INTERNAL_SERVER_ERROR;
        } else {
            if (cause.getCause() != cause) {
                return ExceptionsHelper.status(cause.getCause());
            }
            return ExceptionsHelper.status(cause);
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other better option is to override this in TaskCancelledException .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PritLadani Can you explore the second suggestion to do the override in TaskCancelledException? The code snippet above doesn't look quite right because it calls an "unwrapCause()" method but then proceeds to do another level of cause unwrapping. I'm really concerned about unintended side effects of that approach, and I also don't have a ton of confidence that ITs would catch every possible regression.

}

private static boolean isRejection(String reason) {
return (reason.contains("usage exceeded") || REASON_PARENT_CANCELLED_HIGH_RESOURCE_CONSUMPTION.equals(reason));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make usage exceeded a constant and use that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

onPhaseFailure(
currentPhase,
"SearchTask was cancelled",
new TaskCancelledException(new OpenSearchRejectedExecutionException("cancelled task with reason: " + reason))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This layered exceptions wrapping does not look right, I think we should introduce dedicated exception (fe TaskBackpressureException or alike) to differentiate between cancellation modes.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jul 3, 2023
@PritLadani
Copy link
Contributor Author

PritLadani commented Jul 5, 2023

Will get back to this. Open for other's contribution though.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jul 6, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Aug 6, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR was closed because it has been stalled for 7 days with no activity.

@kotwanikunal
Copy link
Member

Apologies. This PR was auto closed without reaching a resolution from the maintainers.
Re-opening to move it forward.
Thanks for your contributions to OpenSearch!

@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 47dff7e

Incompatible components

Skipped components

Compatible components

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ashking94
Copy link
Member

@PritLadani is this being worked upon? Do tag the maintainers for closure on this if required.

@PritLadani
Copy link
Contributor Author

@kkhatua can someone take a look at this, if not already addressed?


public static void throwTaskCancelledException(String reason) {
if (isRejection(reason)) {
throw new TaskCancelledException(new OpenSearchRejectedExecutionException("cancelled task with reason: " + reason));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just throw OpenSearchRejectedExecutionService if task is rejected ?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 15, 2024
@stephen-crawford
Copy link
Contributor

Hi @PritLadani, it seems like progress on this PR has stopped for a couple months. I recommend we close this for the time being to make sure we can keep track on everything going on. @peternied could you help by closing this PR for now?

Prit please feel free to re-open should you take up further work here.

@peternied
Copy link
Member

@scrawfor99 Thank for pulling me in, but I don't have much context on this change.

@gbbafna @andrross would you might taking a look at this PR and make a call on what should happen next?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label May 5, 2024
@dblock
Copy link
Member

dblock commented Jul 15, 2024

Closing. Please reopen if you can/want to finish it.

[Catch All Triage - 1, 2, 3, 4]

@dblock dblock closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants