Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mysql metric for innodb row lock time #7289

Merged
merged 1 commit into from
Aug 7, 2020

Conversation

sayap
Copy link
Contributor

@sayap sayap commented Aug 5, 2020

What does this PR do?

Previously, the code retrieved the internal innodb metric returned by
SHOW STATUS first, and then tried to add the row lock time from any
in-flight query on top of it, based on SHOW ENGINE INNODB STATUS.

The intention was probably to make the metric more accurate, because
the internal innodb metric for row lock time will only be incremented
once a query has completed (either successfully or after exceeding
innodb_lock_wait_timeout).

However, the addition actually didn't work, because we are just merging
2 python dictionaries. So, we would end up with either the accumulating
counter value from SHOW STATUS when there is no in-flight query
waiting for row lock, or the manually calculated value from parsing
SHOW ENGINE INNODB STATUS otherwise. Flipping between these 2 values,
the rate calculated by datadog agent would be useless and confusing.

This is fixed by returning the internal innodb metric as-is. Trying to
add the row lock time from in-flight queries would introduce complexity
without adding much benefit, as the default innodb_lock_wait_timeout
is only 50 seconds anyway.

Motivation

The wrong metric made the graph looks confusing, and also prevented us from setting the alert threshold.

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

Previously, the code retrieved the internal innodb metric returned by
`SHOW STATUS` first, and then tried to add the row lock time from any
in-flight query on top of it, based on `SHOW ENGINE INNODB STATUS`.

The intention was probably to make the metric more accurate, because
the internal innodb metric for row lock time will only be incremented
once a query has completed (either successfully or after exceeding
`innodb_lock_wait_timeout`).

However, the addition actually didn't work, because we are just merging
2 python dictionaries. So, we would end up with either the accumulating
counter value from `SHOW STATUS` when there is no in-flight query
waiting for row lock, or the manually calculated value from parsing
`SHOW ENGINE INNODB STATUS` otherwise. Flipping between these 2 values,
the rate calculated by datadog agent would be useless and confusing.

This is fixed by returning the internal innodb metric as-is. Trying to
add the row lock time from in-flight queries would introduce complexity
without adding much benefit, as the default `innodb_lock_wait_timeout`
is only 50 seconds anyway.
@hithwen hithwen merged commit 3a0f4a4 into DataDog:master Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants