Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: test GC queue in presence of many intents #15997

Closed
tbg opened this issue May 17, 2017 · 10 comments
Closed

storage: test GC queue in presence of many intents #15997

tbg opened this issue May 17, 2017 · 10 comments
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. X-stale

Comments

@tbg
Copy link
Member

tbg commented May 17, 2017

We have done little testing on this case.

@tbg tbg added this to the 1.1 milestone May 17, 2017
@cuongdo
Copy link
Contributor

cuongdo commented Aug 22, 2017

@tschottdorf should this move to 1.2? or later?

@tbg
Copy link
Member Author

tbg commented Aug 22, 2017

1.2 will do for now.

@petermattis petermattis modified the milestones: 1.2, 1.1 Sep 5, 2017
@tbg tbg self-assigned this Sep 18, 2017
@tbg
Copy link
Member Author

tbg commented Sep 18, 2017

I wrote a little tool that fires up a local single node and writes expired txn entries into its liveness range. I haven't tested how the GC queue reacts yet, but the cockroach debug estimate-gc output looks promising (this is after running for ~5s)

$ ./cockroach debug estimate-gc ./cockroach-data-gcpressurizer/1/ 3
RangeID: 3 [/System/NodeLiveness, /System/NodeLivenessMax):
storage.GCInfo{
    Now:                        hlc.Timestamp{WallTime:1505719085523655593, Logical:0},
    Policy:                     config.GCPolicy{TTLSeconds:86400},
    NumKeysAffected:            0,
    IntentsConsidered:          0,
    IntentTxns:                 0,
    TransactionSpanTotal:       13000,
    TransactionSpanGCAborted:   0,
    TransactionSpanGCCommitted: 0,
    TransactionSpanGCPending:   13000,
    TxnSpanGCThreshold:         hlc.Timestamp{WallTime:1505715485523655593, Logical:0},
    AbortSpanTotal:             1,
    AbortSpanConsidered:        0,
    AbortSpanGCNum:             1,
    PushTxn:                    13000,
    ResolveTotal:               0,
    ResolveSuccess:             0,
    Threshold:                  hlc.Timestamp{WallTime:1505632685523655593, Logical:0},

This means that the GC queue will try to push 13k transactions the next time it runs. Each of the transactions holds 100 intents, so after that it will try to resolve 13k*100 = 1.3 million intents.

If a cluster can remain available with this going on on its liveness range, I think we're fairly safe. I'm sure it won't fare too well when I test this tomorrow.

@tbg
Copy link
Member Author

tbg commented Sep 18, 2017

cc @nvanbenschoten wrt #18199.

@tbg
Copy link
Member Author

tbg commented Sep 18, 2017

Yep, as expected this is pretty effective at overwhelming the GC queue on different levels.

image

@nvanbenschoten
Copy link
Member

Awesome @tschottdorf! Being able to easily create this pathological condition in a cluster is a big first step towards addressing it. I'd be curious to hear your ideas towards improving our ability to perform large-scale garbage collection once we're in this state.

tbg added a commit to tbg/cockroach that referenced this issue Sep 20, 2017
Manual testing in cockroachdb#15997 surfaced that one limiting
factor in resolving many intents is contention on the transaction's abort cache entry. In one
extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends
very large batches of intent resolution requests for the same transaction to the intent resolver.

These requests all overlapped on the transaction's abort cache key, causing very slow progress, and
ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this
appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in
cockroachdb#18199 it seemed well worth addressing, by means of

1. allow intent resolutions to not touch the abort span
2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key
   if it is actually going to be accessed.

With these changes, the gc queue was able to clear out a million intents comfortably on my older
13" MacBook (single node).

Also use this option in the intent resolver, where possible -- most transactions don't receive abort
cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid
adding artificial contention there, though in many situations the same intent is resolved and so a
conflict still exists.

Migration: a new field number was added to the proto and the old one preserved. We continue to
populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the
deprecated field. I believe this is sufficient and we can just remove all traces of the old field in
v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).
tbg added a commit to tbg/cockroach that referenced this issue Sep 20, 2017
Fallout from cockroachdb#18199 and corresponding
testing in cockroachdb#15997.

When the context is expired, there is no point in shooting off another gazillion requests.
tbg added a commit to tbg/cockroach that referenced this issue Sep 20, 2017
Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. I think it'll be nontrivial to max out
these budgets in practice, but I can definitely do it in intentionally evil tests, and it's good to
know that there is some rudimentary form of memory accounting in this queue.
tbg added a commit to tbg/cockroach that referenced this issue Sep 21, 2017
Fallout from cockroachdb#18199 and corresponding
testing in cockroachdb#15997.

When the context is expired, there is no point in shooting off another gazillion requests.
tbg added a commit to tbg/cockroach that referenced this issue Sep 21, 2017
Fallout from cockroachdb#18199 and corresponding
testing in cockroachdb#15997.

When the context is expired, there is no point in shooting off another gazillion requests.
tbg added a commit to tbg/cockroach that referenced this issue Sep 30, 2017
Manual testing in cockroachdb#15997 surfaced that one limiting
factor in resolving many intents is contention on the transaction's abort cache entry. In one
extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends
very large batches of intent resolution requests for the same transaction to the intent resolver.

These requests all overlapped on the transaction's abort cache key, causing very slow progress, and
ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this
appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in
cockroachdb#18199 it seemed well worth addressing, by means of

1. allow intent resolutions to not touch the abort span
2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key
   if it is actually going to be accessed.

With these changes, the gc queue was able to clear out a million intents comfortably on my older
13" MacBook (single node).

Also use this option in the intent resolver, where possible -- most transactions don't receive abort
cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid
adding artificial contention there, though in many situations the same intent is resolved and so a
conflict still exists.

Migration: a new field number was added to the proto and the old one preserved. We continue to
populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the
deprecated field. I believe this is sufficient and we can just remove all traces of the old field in
v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).
@tbg
Copy link
Member Author

tbg commented Oct 25, 2017

@cuongdo I wanted to put this on your radar since it's repeatedly become a problem for users (#19216 being the latest two instances). We'll definitely want to make progress here soon.

@tbg
Copy link
Member Author

tbg commented Mar 3, 2018

@danhhz you mentioned the other day that you were interested in getting "store directory fixtures" in addition to the backup-based fixtures. This would be a candidate for that. Is there an issue I should link here?

Moving to 2.1 since that's where it will realistically come together. For now, I'm reasonably sure that the badness I was able to create with the experiments above is no more.

@tbg tbg modified the milestones: 2.0, 2.1 Mar 3, 2018
@danhhz
Copy link
Contributor

danhhz commented Mar 5, 2018

We're still figuring out what the tooling around that will look like but the 2tb backup test already uses them

c.status(`downloading store dumps`)
var wg sync.WaitGroup
for node := 1; node <= nodes; node++ {
node := node
wg.Add(1)
go func() {
defer wg.Done()
c.Run(ctx, node, `mkdir -p /mnt/data1/cockroach`)
path := fmt.Sprintf(`gs://cockroach-fixtures/workload/bank/`+
`version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/`+
`stores=%d/%d.tgz`, nodes, node-1)
c.Run(ctx, node, `gsutil cat `+path+` | tar xvzf - -C /mnt/data1/cockroach/`)
}()
}
wg.Wait()

@tbg tbg modified the milestones: 2.1, 2.0 Mar 7, 2018
@tbg
Copy link
Member Author

tbg commented Mar 7, 2018

Great! Optimistically moving this back to 2.0, then.

@petermattis petermattis modified the milestones: 2.0, 2.1 Apr 5, 2018
@tbg tbg added the A-storage Relating to our storage engine (Pebble) on-disk storage. label May 15, 2018
@tbg tbg added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Jul 22, 2018
@tbg tbg modified the milestones: 2.1, 2.2 Sep 20, 2018
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. X-stale
Projects
None yet
Development

No branches or pull requests

7 participants