Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: produce tracing metadata #55583

Closed
tbg opened this issue Oct 15, 2020 · 5 comments
Closed

concurrency: produce tracing metadata #55583

tbg opened this issue Oct 15, 2020 · 5 comments
Labels
A-kv-observability A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Oct 15, 2020

Is your feature request related to a problem? Please describe.
In the context of always-on tracing, we want to emit structured metadata describing contention events. These will, among others, be consumed by #55243 for inclusion in the statements page. They will also figure prominently in SQL session tracing.

Describe the solution you'd like
Introduce a protobuf that is emitted whenever a transaction conflicts with another transaction during replica evaluation. The guiding principle here should be that given the information in the protobuf, the SQL layer can construct an explanation of the conflict instructive to the average app developer. This definitely needs to include

  1. the key at which the conflict occurred
  2. when the conflict was first detected
  3. the conflicting transaction (after resolution)

NB: since the metadata is emitted right when the conflict ends, the duration of the conflict is implicitly known.

The action will take place in the lock table waiter, which manages a single conflict here:

select {
case <-newStateC:
timerC = nil
state := guard.CurState()
switch state.kind {
case waitFor, waitForDistinguished:
if req.WaitPolicy == lock.WaitPolicy_Error {
// If the waiter has an Error wait policy, resolve the conflict
// immediately without waiting. If the conflict is a lock then
// push the lock holder's transaction using a PUSH_TOUCH to
// determine whether the lock is abandoned or whether its holder
// is still active. If the conflict is a reservation holder,
// raise an error immediately, we know the reservation holder is
// active.
if state.held {
err = w.pushLockTxn(ctx, req, state)
} else {
err = newWriteIntentErr(state)
}
if err != nil {
return err
}
continue
}
// waitFor indicates that the request is waiting on another
// transaction. This transaction may be the lock holder of a
// conflicting lock or the head of a lock-wait queue that the
// request is a part of.
//
// waitForDistinguished is like waitFor, except it instructs the
// waiter to quickly push the conflicting transaction after a short
// liveness push delay instead of waiting out the full deadlock
// detection push delay. The lockTable guarantees that there is
// always at least one request in the waitForDistinguished state for
// each lock that has any waiters.
//
// The purpose of the waitForDistinguished state is to avoid waiting
// out the longer deadlock detection delay before recognizing and
// recovering from the failure of a transaction coordinator for
// *each* of that transaction's previously written intents.
livenessPush := state.kind == waitForDistinguished
deadlockPush := true
// If the conflict is a reservation holder and not a held lock then
// there's no need to perform a liveness push - the request must be
// alive or its context would have been canceled and it would have
// exited its lock wait-queues.
if !state.held {
livenessPush = false
}
// For non-transactional requests, there's no need to perform
// deadlock detection because a non-transactional request can
// not be part of a dependency cycle. Non-transactional requests
// cannot hold locks or reservations.
if req.Txn == nil {
deadlockPush = false
}
// If the request doesn't want to perform a push for either
// reason, continue waiting.
if !livenessPush && !deadlockPush {
continue
}
// If we know that a lock holder is already finalized (COMMITTED
// or ABORTED), there's no reason to push it again. Instead, we
// can skip directly to intent resolution.
//
// As an optimization, we defer the intent resolution until the
// we're done waiting on all conflicting locks in this function.
// This allows us to accumulate a group of intents to resolve
// and send them together as a batch.
//
// Remember that if the lock is held, there will be at least one
// waiter with livenessPush = true (the distinguished waiter),
// so at least one request will enter this branch and perform
// the cleanup on behalf of all other waiters.
if livenessPush {
if pusheeTxn, ok := w.finalizedTxnCache.get(state.txn.ID); ok {
resolve := roachpb.MakeLockUpdate(pusheeTxn, roachpb.Span{Key: state.key})
deferredResolution = append(deferredResolution, resolve)
// Inform the LockManager that the lock has been updated with a
// finalized status so that it gets removed from the lockTable
// and we are allowed to proceed.
//
// For unreplicated locks, this is all that is needed - the
// lockTable is the source of truth so, once removed, the
// unreplicated lock is gone. It is perfectly valid for us to
// instruct the lock to be released because we know that the
// lock's owner is finalized.
//
// For replicated locks, this is a bit of a lie. The lock hasn't
// actually been updated yet, but we will be conducting intent
// resolution in the future (before we observe the corresponding
// MVCC state). This is safe because we already handle cases
// where locks exist only in the MVCC keyspace and not in the
// lockTable.
//
// In the future, we'd like to make this more explicit.
// Specifically, we'd like to augment the lockTable with an
// understanding of finalized but not yet resolved locks. These
// locks will allow conflicting transactions to proceed with
// evaluation without the need to first remove all traces of
// them via a round of replication. This is discussed in more
// detail in #41720. Specifically, see mention of "contention
// footprint" and COMMITTED_BUT_NOT_REMOVABLE.
w.lm.OnLockUpdated(ctx, &deferredResolution[len(deferredResolution)-1])
continue
}
}
// The request should push to detect abandoned locks due to
// failed transaction coordinators, detect deadlocks between
// transactions, or both, but only after delay. This delay
// avoids unnecessary push traffic when the conflicting
// transaction is continuing to make forward progress.
delay := time.Duration(math.MaxInt64)
if livenessPush {
delay = minDuration(delay, LockTableLivenessPushDelay.Get(&w.st.SV))
}
if deadlockPush {
delay = minDuration(delay, LockTableDeadlockDetectionPushDelay.Get(&w.st.SV))
}
// However, if the pushee has the minimum priority or if the
// pusher has the maximum priority, push immediately.
// TODO(nvanbenschoten): flesh these interactions out more and
// add some testing.
if hasMinPriority(state.txn) || hasMaxPriority(req.Txn) {
delay = 0
}
if delay > 0 {
if timer == nil {
timer = timeutil.NewTimer()
defer timer.Stop()
}
timer.Reset(delay)
timerC = timer.C
} else {
// If we don't want to delay the push, don't use a real timer.
// Doing so is both a waste of resources and, more importantly,
// makes TestConcurrencyManagerBasic flaky because there's no
// guarantee that the timer will fire before the goroutine enters
// a "select" waiting state on the next iteration of this loop.
timerC = closedTimerC
}
timerWaitingState = state
case waitElsewhere:
// The lockTable has hit a memory limit and is no longer maintaining
// proper lock wait-queues.
if !state.held {
// If the lock is not held, exit immediately. Requests will
// be ordered when acquiring latches.
return nil
}
// The waiting request is still not safe to proceed with
// evaluation because there is still a transaction holding the
// lock. It should push the transaction it is blocked on
// immediately to wait in that transaction's txnWaitQueue. Once
// this completes, the request should stop waiting on this
// lockTableGuard, as it will no longer observe lock-table state
// transitions.
return w.pushLockTxn(ctx, req, state)
case waitSelf:
// Another request from the same transaction is the reservation
// holder of this lock wait-queue. This can only happen when the
// request's transaction is sending multiple requests concurrently.
// Proceed with waiting without pushing anyone.
case doneWaiting:
// The request has waited for all conflicting locks to be released
// and is at the front of any lock wait-queues. It can now stop
// waiting, re-acquire latches, and check the lockTable again for
// any new conflicts. If it find none, it can proceed with
// evaluation.
return nil
default:
panic("unexpected waiting state")
}
case <-timerC:
// If the request was in the waitFor or waitForDistinguished states
// and did not observe any update to its state for the entire delay,
// it should push. It may be the case that the transaction is part
// of a dependency cycle or that the lock holder's coordinator node
// has crashed.
timerC = nil
if timer != nil {
timer.Read = true
}
// If the request is conflicting with a held lock then it pushes its
// holder synchronously - there is no way it will be able to proceed
// until the lock's transaction undergoes a state transition (either
// completing or being pushed) and then updates the lock's state
// through intent resolution. The request has a dependency on the
// entire conflicting transaction.
//
// However, if the request is conflicting with another request (a
// reservation holder) then it pushes the reservation holder
// asynchronously while continuing to listen to state transition in
// the lockTable. This allows the request to cancel its push if the
// conflicting reservation exits the lock wait-queue without leaving
// behind a lock. In this case, the request has a dependency on the
// conflicting request but not necessarily the entire conflicting
// transaction.
if timerWaitingState.held {
err = w.pushLockTxn(ctx, req, timerWaitingState)
} else {
// It would be more natural to launch an async task for the push
// and continue listening on this goroutine for lockTable state
// transitions, but doing so is harder to test against. Instead,
// we launch an async task to listen to lockTable state and
// synchronously push. If the watcher goroutine detects a
// lockTable change, it cancels the context on the push.
pushCtx, pushCancel := context.WithCancel(ctx)
go w.watchForNotifications(pushCtx, pushCancel, newStateC)
err = w.pushRequestTxn(pushCtx, req, timerWaitingState)
if errors.Is(pushCtx.Err(), context.Canceled) {
// Ignore the context canceled error. If this was for the
// parent context then we'll notice on the next select.
err = nil
}
pushCancel()
}
if err != nil {
return err
}
case <-ctxDoneC:
return roachpb.NewError(ctx.Err())
case <-shouldQuiesceC:
return roachpb.NewError(&roachpb.NodeUnavailableError{})
}
}
}

We'll want to create the protobuf at the top of the method, register it with the registry of inflight operations (also part of the always-on work) and emit it to the trace span (if any) when the conflict has been handled.

Note that the concurrency manager also has a second path that is taken for conflict resolution, via write intent errors. This also has a subcase that needs to emit an event, though it is not thought to be commonly hit (~only when the lock table grows too large).

if wait {
for i := range t.Intents {
intent := &t.Intents[i]
if err := m.ltw.WaitOnLock(ctx, g.Req, intent); err != nil {
m.FinishReq(g)
return nil, err
}
}
}

The existing testing in the concurrency package should lend itself very well to verifying that the correct events are emitted. The datadriven tests should be updated to automatically print out any events that are emitted, which means that the existing tests will already give us good coverage.

Describe alternatives you've considered

Additional context

@tbg tbg added A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Oct 15, 2020
@tbg
Copy link
Member Author

tbg commented Oct 28, 2020

Additional context will follow, but as a starting point, consumers of this can roughly expect to get handed a slice of (protobuf)

type ContentionEvent struct {
  Key roachpb.Key
  Txn roachpb.Transaction // pushee at the end of conflict resolution
  Duration time.Duration // time spent contending on the key against that txn in that one instance
  Outcome enum // pushed?
}

@jordanlewis
Copy link
Member

@tbg, is the Txn in this contention event the transaction that was contended with, or the transaction that is contending? In other words, if I'm a transaction and I get this event, is the Txn my same transaction or another transaction?

@tbg
Copy link
Member Author

tbg commented Nov 3, 2020

The other transaction. My expectation is that the "own transaction" is separately available, but if this turns out not to be the case for some unexpected reason (or the exact snapshot of the transaction at the moment of the conflict matters) it's not a problem to include it in the event as well.

craig bot pushed a commit that referenced this issue Dec 1, 2020
56906: sql: produce mock ContentionEvents and display contention time in EXPLAIN ANALYZE r=RaduBerinde,tbg a=asubiotto

Please take a look at individual commits for details

Release note: None

Closes #56612 

@tbg could you take a look at the first commit which defines a `ContentionEvent` protobuf? It's close to what's described in #55583 (minus some fields that can be added later).

Co-authored-by: Alfonso Subiotto Marques <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue Jan 5, 2021
This is a (very dirty) first stab at cockroachdb#55583 that I am posting to answer
the following questions:

1. is this basic approach sane or should contention metadata be
generated elsewhere (i.e. not in `WaitOn`)? `WaitOn` does not
necessarily see all updates, though I would take the point of
view that a contenting transaction that a request never ends
up waiting on (presumably because it is busy handling other
conflicts) is not one we need to observe.
2. how to best structure the code so that contention events
are tracked. Completing the current approach would likely
result in spaghetti code that is easy to get wrong. We
seem to want to keep a bit of state around; more than we
do with the current structure which intentionally minimizes
state held across iterations of the wait loop. In particular,
it's unclear to me whether `state.txn` and/or `state.key` can
change within a single invocation of `WaitOn`. I think this
all means that I need to understand this code (in particular
that on the other side of newStateC) better.

Release note: None
@tbg
Copy link
Member Author

tbg commented Jan 6, 2021

#58444 will address the bulk of the work here, sans the outcome. As far as I can tell (just talked this through with Nathan) the outcome is either

  • some unknown hard error occurred (not much to do with those)
  • WriteIntentError if WaitPolicy_ERROR was used
  • the contending transaction was taken care of, i.e. it no longer is in the way via an intent or in-mem lock (though its writes may lead to our txn requiring a restart later)

Leaving this issue open to add the outcome enum later.

tbg added a commit to tbg/cockroach that referenced this issue Jan 6, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 7, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 11, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 12, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 20, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
@tbg tbg added the A-tracing Relating to tracing in CockroachDB. label Jan 20, 2021
craig bot pushed a commit that referenced this issue Jan 21, 2021
58444: concurrency: emit structured contention information to trace r=irfansharif,nvanbenschoten a=tbg

This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables #57114.

Touches #55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
pbardea pushed a commit to pbardea/cockroach that referenced this issue Jan 21, 2021
This change attaches a protobuf payload to the current Span whenever a
request conflicts with another transaction. The payload contains the
contending txn (i.e. the pushee) at the time at which it was first
encountered, the key on which the conflict occurred (note that this is
not necessarily the key at which the pushee is anchored) and the time
spent waiting on the conflict (excluding intent resolution).

This enables cockroachdb#57114.

Touches cockroachdb#55583. I am not closing that issue yet because we also want
to potentially track the outcome of the conflict.

Release note: None
@tbg
Copy link
Member Author

tbg commented Feb 24, 2021

We produce metadata and while I said earlier that I wanted to leave the issue open to track improvements, I no longer consider that valuable. We can file new issues as we ship the current contention metadata and determine steps to improve them.

@tbg tbg closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants