storage: test GC queue in presence of many intents #15997

tbg · 2017-05-17T18:05:47Z

We have done little testing on this case.

cuongdo · 2017-08-22T20:17:57Z

@tschottdorf should this move to 1.2? or later?

tbg · 2017-08-22T20:21:05Z

1.2 will do for now.

tbg · 2017-09-18T07:19:53Z

I wrote a little tool that fires up a local single node and writes expired txn entries into its liveness range. I haven't tested how the GC queue reacts yet, but the cockroach debug estimate-gc output looks promising (this is after running for ~5s)

$ ./cockroach debug estimate-gc ./cockroach-data-gcpressurizer/1/ 3
RangeID: 3 [/System/NodeLiveness, /System/NodeLivenessMax):
storage.GCInfo{
    Now:                        hlc.Timestamp{WallTime:1505719085523655593, Logical:0},
    Policy:                     config.GCPolicy{TTLSeconds:86400},
    NumKeysAffected:            0,
    IntentsConsidered:          0,
    IntentTxns:                 0,
    TransactionSpanTotal:       13000,
    TransactionSpanGCAborted:   0,
    TransactionSpanGCCommitted: 0,
    TransactionSpanGCPending:   13000,
    TxnSpanGCThreshold:         hlc.Timestamp{WallTime:1505715485523655593, Logical:0},
    AbortSpanTotal:             1,
    AbortSpanConsidered:        0,
    AbortSpanGCNum:             1,
    PushTxn:                    13000,
    ResolveTotal:               0,
    ResolveSuccess:             0,
    Threshold:                  hlc.Timestamp{WallTime:1505632685523655593, Logical:0},

This means that the GC queue will try to push 13k transactions the next time it runs. Each of the transactions holds 100 intents, so after that it will try to resolve 13k*100 = 1.3 million intents.

If a cluster can remain available with this going on on its liveness range, I think we're fairly safe. I'm sure it won't fare too well when I test this tomorrow.

tbg · 2017-09-18T07:20:23Z

cc @nvanbenschoten wrt #18199.

tbg · 2017-09-18T15:08:45Z

Yep, as expected this is pretty effective at overwhelming the GC queue on different levels.

nvanbenschoten · 2017-09-19T06:37:46Z

Awesome @tschottdorf! Being able to easily create this pathological condition in a cluster is a big first step towards addressing it. I'd be curious to hear your ideas towards improving our ability to perform large-scale garbage collection once we're in this state.

@nvanbenschoten

Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. When the context is expired, there is no point in shooting off another gazillion requests.

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. I think it'll be nontrivial to max out these budgets in practice, but I can definitely do it in intentionally evil tests, and it's good to know that there is some rudimentary form of memory accounting in this queue.

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. When the context is expired, there is no point in shooting off another gazillion requests.

@nvanbenschoten

Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

tbg · 2017-10-25T13:52:45Z

@cuongdo I wanted to put this on your radar since it's repeatedly become a problem for users (#19216 being the latest two instances). We'll definitely want to make progress here soon.

tbg · 2018-03-03T05:27:46Z

@danhhz you mentioned the other day that you were interested in getting "store directory fixtures" in addition to the backup-based fixtures. This would be a candidate for that. Is there an issue I should link here?

Moving to 2.1 since that's where it will realistically come together. For now, I'm reasonably sure that the badness I was able to create with the experiments above is no more.

danhhz · 2018-03-05T18:19:16Z

We're still figuring out what the tooling around that will look like but the 2tb backup test already uses them

cockroach/pkg/cmd/roachtest/backup.go

Lines 32 to 46 in 6f1ba76

    
           c.status(`downloading store dumps`) 
        
           var wg sync.WaitGroup 
        
           for node := 1; node <= nodes; node++ { 
        
           	node := node 
        
           	wg.Add(1) 
        
           	go func() { 
        
           		defer wg.Done() 
        
           		c.Run(ctx, node, `mkdir -p /mnt/data1/cockroach`) 
        
           		path := fmt.Sprintf(`gs://cockroach-fixtures/workload/bank/`+ 
        
           			`version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/`+ 
        
           			`stores=%d/%d.tgz`, nodes, node-1) 
        
           		c.Run(ctx, node, `gsutil cat `+path+` | tar xvzf - -C /mnt/data1/cockroach/`) 
        
           	}() 
        
           } 
        
           wg.Wait()

tbg · 2018-03-07T07:49:29Z

Great! Optimistically moving this back to 2.0, then.

tbg added this to the 1.1 milestone May 17, 2017

tbg mentioned this issue May 17, 2017

storage: overhaul GCQueue.shouldQueue #15897

Merged

petermattis modified the milestones: 1.2, 1.1 Sep 5, 2017

tbg self-assigned this Sep 18, 2017

tbg mentioned this issue Sep 19, 2017

storage: replica write QPS computation seems incorrect #18607

Closed

nvanbenschoten mentioned this issue Sep 20, 2017

reading specific rows never returns #18199

Closed

tbg mentioned this issue Sep 20, 2017

storage: add "noop" intent resolution poisoning option #18635

Closed

tbg mentioned this issue Sep 20, 2017

storage: short-circuit GC queue when context expires #18642

Merged

This was referenced Sep 20, 2017

storage: do not gather up too many intents/keys #18643

Closed

storage: investigate busy loop in split queue #18646

Closed

tbg mentioned this issue Sep 21, 2017

cherrypick-1.1: storage: short-circuit GC queue when context expires #18672

Merged

nvanbenschoten self-assigned this Oct 25, 2017

tbg mentioned this issue Nov 16, 2017

stability: run insert&delete-heavy workload with intent/GC queue pressure #9540

Closed

tbg modified the milestones: 2.0, 2.1 Mar 3, 2018

tbg modified the milestones: 2.1, 2.0 Mar 7, 2018

tbg assigned m-schneider Mar 26, 2018

petermattis modified the milestones: 2.0, 2.1 Apr 5, 2018

tbg mentioned this issue Apr 24, 2018

storage: optimize the GC queue #20052

Closed

4 tasks

tbg added the A-storage Relating to our storage engine (Pebble) on-disk storage. label May 15, 2018

tbg added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Jul 22, 2018

tbg unassigned tbg, nvanbenschoten and m-schneider Sep 20, 2018

tbg modified the milestones: 2.1, 2.2 Sep 20, 2018

petermattis removed this from the 2.2 milestone Oct 5, 2018

lunevalex added the X-stale label Apr 23, 2021

lunevalex closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: test GC queue in presence of many intents #15997

storage: test GC queue in presence of many intents #15997

tbg commented May 17, 2017

cuongdo commented Aug 22, 2017

tbg commented Aug 22, 2017

tbg commented Sep 18, 2017

tbg commented Sep 18, 2017

tbg commented Sep 18, 2017

nvanbenschoten commented Sep 19, 2017

tbg commented Oct 25, 2017

tbg commented Mar 3, 2018

danhhz commented Mar 5, 2018

tbg commented Mar 7, 2018

storage: test GC queue in presence of many intents #15997

storage: test GC queue in presence of many intents #15997

Comments

tbg commented May 17, 2017

cuongdo commented Aug 22, 2017

tbg commented Aug 22, 2017

tbg commented Sep 18, 2017

tbg commented Sep 18, 2017

tbg commented Sep 18, 2017

nvanbenschoten commented Sep 19, 2017

tbg commented Oct 25, 2017

tbg commented Mar 3, 2018

danhhz commented Mar 5, 2018

tbg commented Mar 7, 2018