Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to cancel bannend instances #12772

Closed
Zelldon opened this issue May 16, 2023 · 6 comments · Fixed by #23849
Closed

Allow to cancel bannend instances #12772

Zelldon opened this issue May 16, 2023 · 6 comments · Fixed by #23849
Labels
area/ux Marks an issue as related to improving the user experience component/engine component/zeebe Related to the Zeebe component/team impact/medium Marks issues relieving an occasional pain or fulfilling a reasonable desire kind/feature Categorizes an issue or PR as a feature, i.e. new behavior version:8.3.17 version:8.4.13 version:8.5.9 version:8.6.5 version:8.7.0-alpha1 Label that represents issues released on verions 8.7.0-alpha1

Comments

@Zelldon
Copy link
Member

Zelldon commented May 16, 2023

Is your feature request related to a problem? Please describe.

We have seen quite often now that banning of process instances can happen, even if we think it should be a rare case. If it happens right now it will trash our state, we are currently not allowing to clean up the state.

Banning an instance can happen in multiple cases, one prominent and recent example was that we had a lot of element instances and cancelation failed. This caused the process instance to be banned, meaning all the 100K of element instances are still part of the runtime state.

Describe the solution you'd like

We fixed the cancelation, via allowing to cancel in batches #11355, we should consider allowing the cancelation of a banned instance in order to clean up the state. Even if it fails no harm is done, the command will be rejected.

Describe alternatives you've considered

There is no alternative, which is the problem.

Additional context

We considering to replace banned instances #5121, but this is a long term goal, we should fix the current situation.

@Zelldon Zelldon added kind/feature Categorizes an issue or PR as a feature, i.e. new behavior area/ux Marks an issue as related to improving the user experience component/engine labels May 16, 2023
@Zelldon
Copy link
Member Author

Zelldon commented May 16, 2023

BTW this would allows us also to avoid the step of mimik canceling in operate, which is sometimes necessary to make the banning more visible in operate.

@Zelldon Zelldon changed the title Allow to cancel blacklisted instances Allow to cancel bannend instances Jun 14, 2023
@koevskinikola
Copy link
Member

ZPA triage:

  • The feature is useful and brings a lot of value in it since:
    • it makes the support process more efficient (less steps to cancel a banned process instance)
    • it provides a workaround for handling banned process instances until we replace the banning mechanism with incidents
  • A proposed solution is to just allow regular process instance cancelation for banned process instances, with some error handling if the cancelation is not successful. The challenge would be to cancel nested elements of a banned process instance.
  • @aleksander-dytko please have a look at the priority of this issue. As we noted above, the feature brings a lot of value for our improvement process.

@Zelldon
Copy link
Member Author

Zelldon commented Jul 12, 2023

I'm wondering why PM needs to decide or prioritize that, if engineering sees a need for certain topics (which clearly fall in our space) let's just do it and prioritize it.

Maybe I'm missing something (sorry if so) but I think we should still be able to decide on our own what is necessary to build a great, stable, and reliable product. We can do the discussion also offline.

@aleksander-dytko
Copy link
Contributor

aleksander-dytko commented Jul 24, 2023

We've prioritized the Replace Zeebe Instance Banning concept with regular incident handling for 8.4 to look at this holistically.

@camunda/zeebe-process-automation:

  • How would you estimate the effort to complete this?
  • Do we know how much time is spent (roughly) each time to cancel banned instances?
  • Would this be useful for the complete replacement, or this would be thrown away?

CC: @abbasadel

@korthout
Copy link
Member

korthout commented Jul 25, 2023

How would you estimate the effort to complete this?

Hard to say without a breakdown. I don't think anyone has dedicated time to this yet. It is not trivial, but not very large.

Do we know how much time is spent (roughly) each time to cancel banned instances?

We have no way to cancel banned instances at this time. That is what this feature request wants to change. This means that banned instances always take up space in Zeebe and are visible in Operate, which leads to confused users.

The banned instance can be removed from Operate with a special script, but you'll need to ask them how much time that takes.

Would this be useful for the complete replacement, or this would be thrown away?

If we can completely eradicate the banned instance concept then this feature would be thrown away. If we can't fully replace instance banning with incidents, then this feature will continue to be useful.

@abbasadel abbasadel added the impact/medium Marks issues relieving an occasional pain or fulfilling a reasonable desire label Jul 31, 2023
@abbasadel
Copy link
Contributor

Hi @Zelldon ,

We had another look, and we now have an agreement with the PM that such technical features would be assessed and prioritized by the engineering team as long as the effort does not exceed two weeks of work (X-Large)

We will keep this in the backlog for now and revisit it once we have more capacity.

@romansmirnov romansmirnov added the component/zeebe Related to the Zeebe component/team label Mar 5, 2024
github-merge-queue bot pushed a commit that referenced this issue Oct 28, 2024
## Description

<!-- Describe the goal and purpose of this PR. -->

After this PR, users will be able to cancel banned instances. Therefore,
they will be able to get rid of unnecessarily written timer records.

In addition to automated test cases, following cases are tested
manually:

- Throw exception in the ProcessProcessor onTerminate method for the
first call. Observe rejection
for the first call. Then, send cancellation command again. Expect that
the banned process instance
is cancelled on the second command.
- Throw exception in the ProcessInstanceStateTransitionGuard
checkStateTransition method `case:
TERMINATE_ELEMENT` for the first call. Observe rejection for the first
call. Then, send
cancellation command again. Expect that the banned process instance is
cancelled on the second
command.
- Create a looping process with 120000 timer instances created. Cancel
the process and observe that
the cancellation works with huge amount of timers. Will therefore work
for banned instances with
huge number of timer instances created. (the test setup for the support
case)

Cases above are tested to verify if something goes wrong with the
cancellation, they can be retried (in case they are not a persistent bug
that needs to be resolved with a bugfix).

### Motivation for choosing cancellation as a solution

We chose cancellation as a solution for cleaning up banned instances
because of following reasons:

- Easier to implement
- State cleanup will be handled gracefully
- No need to maintain a separate code piece to cleanup the state
- Users will now have control over banned instances. For example, they
can get rid of banned instances stuck (shown as active forever) in
Operate UI.

## Checklist

<!--- Please delete options that are not relevant. Boxes should be
checked by reviewer. -->
- [ ] for CI changes:
- [ ] structural/foundational changes signed off by [CI
DRI](https://github.com/cmur2)
- [ ]
[ci.yml](https://github.com/camunda/camunda/blob/main/.github/workflows/ci.yml)
modifications comply with ["Unified CI"
requirements](https://github.com/camunda/camunda/wiki/CI-&-Automation#workflow-inclusion-criteria)

## Related issues

closes #14213
closes #12772
github-merge-queue bot pushed a commit that referenced this issue Oct 29, 2024
# Description
Backport of #23849 to `stable/8.4`.

relates to #14213 #12772
original author: @berkaycanbc
github-merge-queue bot pushed a commit that referenced this issue Oct 29, 2024
# Description
Backport of #23849 to `stable/8.6`.

relates to #14213 #12772
original author: @berkaycanbc

Conflict: Just didn't need the last commit of the original PR:
02bc854
github-merge-queue bot pushed a commit that referenced this issue Oct 29, 2024
# Description
Backport of #23849 to `stable/8.3`.

relates to #14213 #12772
original author: @berkaycanbc
github-merge-queue bot pushed a commit that referenced this issue Oct 29, 2024
# Description
Backport of #23849 to `stable/8.5`.

relates to #14213 #12772
original author: @berkaycanbc
@camundait camundait added the version:8.7.0-alpha1 Label that represents issues released on verions 8.7.0-alpha1 label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ux Marks an issue as related to improving the user experience component/engine component/zeebe Related to the Zeebe component/team impact/medium Marks issues relieving an occasional pain or fulfilling a reasonable desire kind/feature Categorizes an issue or PR as a feature, i.e. new behavior version:8.3.17 version:8.4.13 version:8.5.9 version:8.6.5 version:8.7.0-alpha1 Label that represents issues released on verions 8.7.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants