Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On DB2, Update removal time of batch job stacktraces results in transaction timeouts #4067

Closed
3 tasks done
yanavasileva opened this issue Feb 1, 2024 · 1 comment
Closed
3 tasks done
Assignees
Labels

Comments

@yanavasileva
Copy link
Member

yanavasileva commented Feb 1, 2024

Environment (Required on creation)

  • DB2 Version 11.5.7
  • JDBC Driver 4.31.10_1
  • Camunda 7.18

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

Batch operation takes too long to complete
Cardinalities:

Query Table Number
select count * from ACT_GE_BYTEARRAY 48 935 974
select count * from ACT_GE_BYTEARRAY WHERE NAME_='job.exceptionByteArray' 3 904 303
select count * from ACT_HI_JOB_LOG 298 948 766
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') ACT_GE_BYTEARRAY 49 000 331
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') ACT_HI_JOB_LOG 300 785 913

Steps to reproduce (Required on creation)

  1. Has cardinalities as described above
  2. Start a batch operation that can have failing jobs that are resolved with retry
  3. Wait for the batch to complete

Observed Behavior (Required on creation)

Transactions timeouts in server log.
Batch is taking longer time to complete even though all jobs are completed. After checking the database logs the execution of update ACT_GE_BYTEARRAY set REMOVAL_TIME_ = ? where ID_ IN ( SELECT JOB_EXCEPTION_STACK_ID_ FROM ACT_HI_JOB_LOG WHERE JOB_DEF_CONFIGURATION_ = ? ) is taking long time to complete

Expected behavior (Required on creation)

No transaction timeouts, the query doesn't slow down batch completion.

Root Cause (Required on prioritization)

With given cardinalities, a full table scan is performed on ACT_GE_BYTEARRAY (n=48 935 974) and no index is used. Query plan:
https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543

Solution Ideas

For DB2 database: Adjust the query to include NAME_ = "job.exceptionByteArray" in the WHERE clause to filter out queried byte arrays and potentially use ACT_IDX_BYTEARRAY_NAME to speed up query execution. Query plan:
https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543

Total cost original query: 3.11157e+07
Total cost adjusted query: 2.25563e+06
Total cost reduce: 92.75 % faster

Hints

Links

Breakdown

Pull Requests

Preview Give feedback
  1. ci:db2
    joaquinfelici

Dev2QA handover

  • Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment
@garima-camunda
Copy link
Contributor

@joaquinfelici The status of ticket is still shown as "Open" even though it has been released in a patch as per this message - https://camunda.slack.com/archives/CHAC0L80M/p1714120865236089 . Could you please check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants