Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

[TEX] Part 7: wiring data collection #6340

Closed
wants to merge 70 commits into from
Closed

[TEX] Part 7: wiring data collection #6340

wants to merge 70 commits into from

Conversation

ergo14
Copy link
Contributor

@ergo14 ergo14 commented Oct 31, 2022

General

Wiring reportExpectationsCollectedData in the right spots and add metrics

Priority:
P2
Concerns / possible downsides (what feedback would you like?):
Did I wrap all the spots where transactions are used to run tasks?
Open to suggestions for metric namespace name (expectations is a little too broad). Always, note we expect to have another metrics namespace for the violations in future TEX PRs.
Is documentation needed?:
No

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
No
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
No
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):
Yes
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:
No
Does this PR need a schema migration?
No

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:
No-op wiring, existing tests should suffice
What was existing testing like? What have you done to improve it?:
N/A
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:
N/A
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:
N/A

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):
N/A
Has the safety of all log arguments been decided correctly?:
N/A
Will this change significantly affect our spending on metrics or logs?:
N/A
How would I tell that this PR does not work in production? (monitors, etc.):
N/A
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:
recall and rollback
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
N/A

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:
N/A
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:
N/A
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:
N/A

Development Process

Where should we start reviewing?:
ExpectationsAwareTransaction
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

@changelog-app
Copy link

changelog-app bot commented Oct 31, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

[TEX] data collection WIP

Check the box to generate changelog(s)

  • Generate changelog entry

@ergo14 ergo14 changed the base branch from develop to tex-pr-1d2 November 1, 2022 11:13
@ergo14 ergo14 changed the base branch from tex-pr-1d2 to develop November 11, 2022 20:01
@changelog-app
Copy link

changelog-app bot commented Nov 11, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Transactional expectations: enables metrics collections for key value service read calls on transactions post-mortem.
Metrics tracked per transaction: transaction age, number of bytes read, number of Atlas kvs method queries and number of bytes read for the worse Atlas kvs call (one with the most bytes read).

Check the box to generate changelog(s)

  • Generate changelog entry

@ergo14 ergo14 changed the title [TEX] data collection WIP [TEX] Part 6: no-op wiring Nov 14, 2022
@ergo14 ergo14 marked this pull request as ready for review November 14, 2022 13:18
@ergo14 ergo14 changed the title [TEX] Part 6: no-op wiring [TEX] Part 7: no-op wiring Nov 15, 2022
@ergo14 ergo14 changed the base branch from develop to tex-pr-6 November 15, 2022 11:35
@@ -22,7 +22,6 @@
* Implementors of this interface provide methods useful for tracking transactional expectations and whether
* they were breached as well as relevant metrics and alerts. Transactional expectations represent transaction-level
* limits and rules for proper usage of AtlasDB transactions (e.g. reading too much data overall).
* Todo(aalouane): move this out of API once part 4 is merged
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to move this to client for 8 minutes but broke some imports that i couldn't fix quickly, we can look at this together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paired offline, resolved!

Copy link
Contributor

@mdaudali mdaudali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paired offline, lgtm! Cutting an RC to test on internal proxy to check metrics are as you expect

(Lets add tests to verify that report doesn't run if the transaction is (!definitively committed || aborted)

@@ -22,7 +22,6 @@
* Implementors of this interface provide methods useful for tracking transactional expectations and whether
* they were breached as well as relevant metrics and alerts. Transactional expectations represent transaction-level
* limits and rules for proper usage of AtlasDB transactions (e.g. reading too much data overall).
* Todo(aalouane): move this out of API once part 4 is merged
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paired offline, resolved!


@Override
public void reportExpectationsCollectedData() {
if (!isDefinitivelyCommitted() && !isAborted()) {
Copy link
Contributor

@mdaudali mdaudali Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While correct, I think this would be easier to read (by applying De Morgen laws) -> !(isDefinitivelyCommitted || isAborted)

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good.

Do we need tests for the updates?

Comment on lines +2598 to +2600
if (!List.of(State.COMMITTED, State.ABORTED, State.FAILED).contains(state.get())) {
log.error(
"reportExpectationsCollectedData is called on an in-progress transaction",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably reuse some of the logic from ensureStillRunning() - it isn't the same, but there should be similar bits you can use :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants