reading specific rows never returns #18199

jcsdt · 2017-09-04T16:06:32Z

Bug report

We are running a 3 nodes cluster under v1.0.5

Node 1:

I170904 17:40:50.553387 303137322 util/log/clog.go:910  [config] file created at: 2017/09/04 17:40:50
I170904 17:40:50.553387 303137322 util/log/clog.go:910  [config] running on machine: online-cockroach-a
I170904 17:40:50.553387 303137322 util/log/clog.go:910  [config] binary: CockroachDB CCL v1.0.5 (linux amd64, built 2017/08/24 17:43:46, go1.8.3)
I170904 17:40:50.553387 303137322 util/log/clog.go:910  [config] arguments: [/home/lefty/cockroach start --store /data/1/cockroach --store /data/2/cockroach --store /data/3/cockroach --host=0.0.0.0 --advertise-host=10.91.208.131 --certs-dir /home/lefty/cockroach_certs --http-host 0.0.0.0 --http-port 8080 --join 10.91.151.188 --join 10.91.155.209]

Node 2:

I170904 17:44:10.433957 250568015 util/log/clog.go:910  [config] file created at: 2017/09/04 17:44:10
I170904 17:44:10.433957 250568015 util/log/clog.go:910  [config] running on machine: online-cockroach-b
I170904 17:44:10.433957 250568015 util/log/clog.go:910  [config] binary: CockroachDB CCL v1.0.5 (linux amd64, built 2017/08/24 17:43:46, go1.8.3)
I170904 17:44:10.433957 250568015 util/log/clog.go:910  [config] arguments: [/home/lefty/cockroach start --store /data/1/cockroach --store /data/2/cockroach --store /data/3/cockroach --host=0.0.0.0 --advertise-host=10.91.151.188 --certs-dir /home/lefty/cockroach_certs --http-host 0.0.0.0 --http-port 8080 --join 10.91.208.131 --join 10.91.155.209]

Node 3:

I170904 17:44:34.696304 36972590 util/log/clog.go:910  [config] file created at: 2017/09/04 17:44:34
I170904 17:44:34.696304 36972590 util/log/clog.go:910  [config] running on machine: online-cockroach-c
I170904 17:44:34.696304 36972590 util/log/clog.go:910  [config] binary: CockroachDB CCL v1.0.5 (linux amd64, built 2017/08/24 17:43:46, go1.8.3)
I170904 17:44:34.696304 36972590 util/log/clog.go:910  [config] arguments: [/home/lefty/cockroach start --store /data/1/cockroach --store /data/2/cockroach --store /data/3/cockroach --host=0.0.0.0 --advertise-host=10.91.155.209 --certs-dir /home/lefty/certs --http-host 0.0.0.0 --http-port 8080 --join 10.91.208.131 --join 10.91.151.188]

Some of our queries stopped returning results, even simple queries such as
SELECT id from users where id = '2970788235'; ( id is the primary key of the table )

We tried those queries from both our java code and cockroach cli.
While this happens for specific rows, the server continues to work fine when reading others.

Not sure it is related but our cockroach logs output a lot of

W170904 17:30:35.746218 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746247 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746295 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746343 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746392 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746445 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746506 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746536 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746565 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746605 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746642 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746693 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746739 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746777 241 storage/gc_queue.go:417  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] unable to resolve intents of committed txn on gc: context deadline exceeded
W170904 17:30:35.746897 302617246 storage/gc_queue.go:879  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] push of txn id=ac2d8029 key=/Table/52/1/"5320052451#1500557009361"/0 rw=false pri=0.01513283 iso=SERIALIZABLE stat=PE
NDING epo=0 ts=1500557007.703880494,0 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=3 failed: context deadline exceeded
W170904 17:30:35.746907 302617051 storage/gc_queue.go:879  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] push of txn id=e925a948 key=/Table/52/1/"5319772163#1500556927647"/0 rw=false pri=0.00135358 iso=SERIALIZABLE stat=PE
NDING epo=0 ts=1500556918.902844582,0 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=4 failed: context deadline exceeded
W170904 17:30:35.746912 302617057 storage/gc_queue.go:879  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] push of txn id=e93d4ffc key=/Table/52/1/"5320670056#1500972905215"/0 rw=false pri=0.03541256 iso=SERIALIZABLE stat=PE
NDING epo=0 ts=1500972901.427410919,0 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=3 failed: context deadline exceeded
W170904 17:30:35.746933 302617247 storage/gc_queue.go:879  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] push of txn id=541e71ce key=/Table/52/1/"5320052451#1500557180174"/0 rw=false pri=0.02997946 iso=SERIALIZABLE stat=PE
NDING epo=0 ts=1500557104.540700487,0 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=5 failed: context deadline exceeded
W170904 17:30:35.746947 302617189 storage/gc_queue.go:879  [gc,n1,s2,r10122/3:/Table/52/1/"53{19312…-20860…}] push of txn id=aae047d9 key=/Table/52/1/"5320063883#1500516197839"/0 rw=false pri=0.01194207 iso=SERIALIZABLE stat=PE
NDING epo=0 ts=1500516197.750966587,0 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=3 failed: context deadline exceeded

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2017-09-05T19:29:16Z

Hi @jcsdt, thanks for the report! Would you mind grepping your logs on node 1 for the string ,r10122/ up to the point where you saw the first unable to resolve intents of committed txn on gc: context deadline exceeded error and posting the results?

Another option would be to look at debug tracing. If possible, could you run SET CLUSTER SETTING trace.debug.enable = true? This will allow us to peer into all requests on the admin UI's /debug/requests page. From here we can look at all active requests by clicking on the [<number> active] button. This should be pretty loud based on those logs and should be able to point out where those GC requests are getting stuck.

In terms of what's going wrong here, the gcQueue has the default timeout of 1m, so this doesn't seem like a timeout issue (like #18155). Without a bit more information it might be tough to track down.

christian-lefty · 2017-09-05T20:37:35Z

root@localhost:26257/> SET CLUSTER SETTING trace.debug.enable = true;
pq: unknown cluster setting 'trace.debug.enable'

Is this expected ?

christian-lefty · 2017-09-05T20:47:04Z

It's not listed on that page anyways
https://www.cockroachlabs.com/docs/stable/cluster-settings.html

nvanbenschoten · 2017-09-05T20:59:30Z

Ah yeah, you're running v1.0.5. SET CLUSTER SETTING trace.debug.enable = true will be in v1.1 but for now, we'll have to start Cockroach with the environment variable COCKROACH_ENABLE_TRACING=true.

Based on your logs it looks like you've already tried restarting the cluster. Is that correct?

christian-lefty · 2017-09-05T21:03:12Z

Yes, we've restarted, we've also upgraded from 10.4 to 1.0.5

Anyways I did the environment variable thing. Shall I just send you access to the UI ?

nvanbenschoten · 2017-09-05T21:04:21Z

That would be great. You can email me if you'd prefer.

nvanbenschoten · 2017-09-05T21:44:50Z

I'm seeing the same retry proposal 5e933a46d8e7d317: reasonTicks issue we saw in #17524. Looking back at #17741, it looks like that was a symptom over there as well. Could they all be the same root issue?

petermattis · 2017-09-05T22:20:09Z

The bug that caused #17741 is definitely in 1.0.x which is why we're putting the fix into 1.0.6 (not yet released).

nvanbenschoten · 2017-09-06T15:01:44Z

Before the rolling restart to v1.1-alpha.20170817 is saw a number of ranges which had lease epochs beneath their leaseholder's node liveness epoch. This was a clear symptom of the bug that caused #17741. However, the range that I was hoping to see exhibit this behavior (r10122, the one in the retry loop) strangely wasn't.

@christian-lefty it looks like you restarted your cluster this morning. Has the issue gone away? If this is what we're suspecting then it shouldn't have because that bug wasn't fixed yet in the version of Cockroach you're running.

christian-lefty · 2017-09-06T15:45:44Z

Hey so we rolled v1.1-alpha.20170817 this didnt fix our issue

nvanbenschoten · 2017-09-06T16:17:45Z

The fix for the issue I've referenced will be in tomorrow's alpha release. This should hopefully fix the issue we're seeing on your cluster. If you're able to compile from source then you can check out the SHA 3cab35b and build that. If not, I can send you the alpha binary that we're going to publish tomorrow.

christian-lefty · 2017-09-06T16:20:59Z

If it's easy for you to send it that would be better cause I've never built CDB from source so I'd have to setup everything.

nvanbenschoten · 2017-09-06T16:26:16Z

Sure, I'll email you the binary.

nvanbenschoten · 2017-09-07T20:11:13Z

@christian-lefty are you still seeing similar logs messages to the ones posted above?

nvanbenschoten · 2017-09-08T07:06:22Z

Yeah, it looks like you're still having issues with your cluster. Specifically, there seem to be some stray intents that our GC process is having issues cleaning up. I'm consistently seeing:

[gc,n3,s7,r10568/4:/Table/52/1/"56{16648â€¦-20705â€¦}] unable to resolve intents of committed txn on gc: context deadline exceeded

in the low verbosity logs I have access to. It seems like these stray intents are blocking any requests that happen to come upon them, which would explain the logs like:

"[gc,n2,s6,r9216/3:/Table/52/2/NULL/"54{35â€¦-54â€¦}] push of txn id=3d0a91cd key=/Table/52/1/"5436607633#1496793886960"/0 rw=false pri=0.01699221 iso=SERIALIZABLE stat=PENDING epo=0 ts=1496793894.500072856,1 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=11 failed: context deadline exceeded"

and corresponding goroutines stuck for hours in maybePushTransactions.

On top of this, there are a few goroutines that have been stuck in beginCmds and a few that have been stuck in redirectOnOrAcquireLease for about the same amount of time. At the moment, I'm not sure what to make of these other than that they're probably the pushTxnRequests we're looking for and that they're getting clogged up somewhere.

I don't think we're currently tracking any issues where intents for a committed transaction have gotten stuck. @bdarnell or @tschottdorf, do you know of any that I'm forgetting/not finding?

It's tough to tell from the outside what exactly is causing this issue. Right now I think a load balancer is getting in the way of request tracing (that may be why the /debug/requests page is only showing a single active trace), which removes an opportunity to get more insight into what's going wrong. I'm also really missing the new logspy mode right now!

christian-lefty · 2017-09-08T15:38:17Z

Hi Nathan, so what are our options now? Should we rollback to a binary before 1.0? is that even gonna work?

On Fri, 8 Sep 2017 at 09:06, Nathan VanBenschoten ***@***.***> wrote: Yeah, it looks like you're still having issues with your cluster. Specifically, there seem to be some stray intents that our GC process is having issues cleaning up. I'm consistently seeing: [gc,n3,s7,r10568/4:/Table/52/1/"56{16648â€¦-20705â€¦}] unable to resolve intents of committed txn on gc: context deadline exceeded in the low verbosity logs I have access to. It seems like these stray intents are blocking any requests that happen to come upon them, which would explain the logs like: "[gc,n2,s6,r9216/3:/Table/52/2/NULL/"54{35â€¦-54â€¦}] push of txn id=3d0a91cd key=/Table/52/1/"5436607633#1496793886960"/0 rw=false pri=0.01699221 iso=SERIALIZABLE stat=PENDING epo=0 ts=1496793894.500072856,1 orig=0.000000000,0 max=0.000000000,0 wto=false rop=false seq=11 failed: context deadline exceeded" and corresponding goroutines stuck for hours in maybePushTransactions. On top of this, there are a few goroutines that have been stuck in beginCmds and a few that have been stuck in redirectOnOrAcquireLease for about the same amount of time. At the moment, I'm not sure what to make of these other than that they're probably the pushTxnRequests we're looking for and that they're getting clogged up somewhere. I don't think we're currently tracking any issues where intents for a committed transaction have gotten stuck. @bdarnell <https://github.com/bdarnell> or @tschottdorf <https://github.com/tschottdorf>, do you know of any that I'm forgetting/not finding? It's tough to tell from the outside what exactly is causing this issue. Right now I think a load balancer is getting in the way of request tracing (that may be why the /debug/requests page is only showing a single active trace), which removes an opportunity to get more insight into what's going wrong. I'm also really missing the new logspy <#18081> mode right now! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18199 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ANDDMJ2WqOIfCudQ2myFTZmdeg-KWH_tks5sgOeBgaJpZM4PMFaW> .

-- Christian Rivasseau Co-founder and CTO @ Lefty +33 6 67 35 26 74

bdarnell · 2017-09-08T16:42:19Z

@christian-lefty Downgrading from 1.1 to 1.0.x is safe[1]. Going back further than that is not.

Do you have any clients that might be keeping a long-running transaction alive?

Reducing kv.gc.batch_size might help. Try SET CLUSTER SETTING kv.gc.batch_size=1000; (it defaults to 100k). Maybe even try setting it to 1 to see if that gets things unwedged (I think it would be problematic to run with a GC batch size of 1 in the long term, but it might help here)

@nvanbenschoten Could it be the lastTerm issue? This looks like a lease/liveness issue, not anything specific to intents or transactions.

My recommendation is to move forward with a fresh build from the release-1.1 branch to get the lastTerm fix, and the logspy endpoint if that doesn't fix it.

[1] Details on downgrade safety: the final step in the upgrade process will be to run SET CLUSTER SETTING VERSION='1.1'. Once this has been done, downgrading is no longer allowed (but if you just run the new binary without this step, you can go back). This is done automatically when version 1.1 initializes a new cluster, so you can't downgrade a cluster to a version that is older than the one used to create it.

bdarnell · 2017-09-08T16:47:58Z

Here is a pre-built linux binary you can use.

nvanbenschoten · 2017-09-08T20:46:57Z

@bdarnell I don't think this is the lastTerm issue. @jcsdt initially ran into this on v1.0.5, which was before the lastTerm caching was introduced. It has also remained across a few restarts, so whatever is stuck must be persistently stored.

Another interesting datapoint is that I'm seeing GCs fail because of lease acquisition issues. I also see a lease acquisition issues on the node liveness range in the traces (requested lease overlaps previous lease). The range debug page for node liveness looks fine now though, other than 15 dropped commands on a follower.

The kv.gc.batch_size is a good idea. At the very least, that should help isolate exactly what's stuck by cleaning up anything around it. @christian-lefty if you do run the binary provided above, let me know. The logspy endpoint introduced there should provide a lot better visibility in the problem.

jcsdt · 2017-09-11T16:19:58Z

@nvanbenschoten we deployed the binary above so you can access logspy

We also set kv.gc.batch_size=1; but we're still seeing

unable to resolve intents of committed txn on gc: context deadline exceeded

nvanbenschoten · 2017-09-12T19:05:26Z

@jcsdt thanks for updating the binary. The logspy endpoint has been helpful in letting us identify part of the error. It looks like we're having serious issues performing RangeLookups for the range that contains the stuck intents (think DNS). Would you mind running the command SET CLUSTER SETTING trace.debug.enable = 'true' one more time, so that we can watch a trace of this lookup-cache-evict loop?

christian-lefty · 2017-09-12T21:28:38Z

Hey @nvanbenschoten just enabled traces if you wanna take a look

nvanbenschoten · 2017-09-13T05:59:14Z

Thanks @christian-lefty! I'm seeing node1 being brought up and down. Is this intentional on your end?

Also, I know you're still seeing the unable to resolve intents log messages, but I'm curious if the initial symptom of SELECT id from users where id = '2970788235'; not finishing is still visible. I ask because throughout my debugging I haven't actually been able to track down anything that's completely stuck. What I have seen is very backed up replication on certain ranges resulting in cascading performance degradations elsewhere.

The cause of this slow replication is unclear to me but could be due in part to large hotspots of activity in the client workload on the user_archives table. This hotspot might be around the key range /Table/52/1/"#1505243733147"-/Table/52/1/"1000875591#1487783699744", although it seems to always be moving slowly. Here, I'm seeing about 4000 (large) Raft commands/sec on a single range. Does a workload like this sound characteristic of your application? An example of what this might look like is an application that inserts into the user_archives table about 4000 times per second with primary keys (in this case, id) that are sequentially ordered.

petermattis · 2017-09-18T15:37:38Z

Fortunately, with a little refactoring in etcd/raft I think we can avoid the need for a precise count. What we require here is to ensure that we never have more than one config change in flight at a time. If we simply assume pessimistically that the tail of the log has a config change (so that the new leader cannot propose a config change until it has applied all entries up to the point of its election) and we can skip the scan.

When would you clear raft.pendingConf? Currently that field gets cleared when the conf change is applied. Keeping track of the the current last index and watching for when that index is committed seems doable, but a bit tricky. Did you have a simpler idea in mind?

bdarnell · 2017-09-18T15:47:21Z

This might also be expected because I suspect the uncommitted entries in n2 and n3 forked a long time ago when the other issue began. It might fix itself once the index of the msgApps sent to n2 gets down to n2's lastAppliedIndex of 1689719.

Yeah, I think the slow recovery here is "expected" for this pathological case. It's probing one entry at a time for the log index where they diverged because it is very strange for two leaders to be able to ping-pong like this and each accumulating their own conflicting fork of the log.

Immediately jumping back to the lastAppliedIndex (so that n2 truncates its entire uncommitted log tail and n3 ships it a new copy of the log) would be optimal in this case, but would result in unnecessary log copies in cases where the divergence is small. I'm not sure which case is better to optimize for (maybe just take larger steps? This seems like a difficult heuristic to tune since it only matters in rare cases)

When would you clear raft.pendingConf? Currently that field gets cleared when the conf change is applied. Keeping track of the the current last index and watching for when that index is committed seems doable, but a bit tricky. Did you have a simpler idea in mind?

My idea is to track the last index (instead of a bool pendingConf, it would be configChangeBlockedUntilIndex). I don't think it will be tricky because it doesn't need to persist across leadership changes.

christian-lefty · 2017-09-18T16:15:02Z

@nvanbenschoten actually our queries are still hanging.
Should I still wait some more? Let me know when the cluster looks stable.

nvanbenschoten · 2017-09-19T06:52:11Z

@christian-lefty yeah I can see that range 3591 is still struggling. node1 is currently waiting on a snapshot for that range, but I suspect that there is a long queue to perform snapshots at the moment since we only allow a single snapshot at a time. I don't have any solid proof for that yet though, so I'll do some digging.

Could you try sending that query to one of the other two node's SQL gateways? Also, yes let's keep this cluster up for a bit longer.

nvanbenschoten · 2017-09-20T04:15:31Z

The probe to catch n1, r11844 up is still ongoing, but it is slowly making progress. Based on its current rate, it should be done within the next few hours. I'm not sure why this slowed down so much as it was probing at about 20 indices/sec before and it's all the way down to about 10 indices/min. The only clue I have here is that I have seen the splitQueue's use of MVCCFindSplitKey showing up in profiles, which @tschottdorf and I witnessed have a very troubling effect earlier today. I'm sure he'll pursue this further in #15997.

It looks like the snapshots for n1, r3591 are still failing. I was finally able to see the following error message:

E170920 05:48:34.003360 127 storage/queue.go:656  [raftsnapshot,n3,s7,r3591/1:/Table/{SystemCon…-11}] snapshot failed: rate: Wait(n=1) would exceed context deadline

That log is produced by the rate package here. It indicates that a context timeout occurred when processing the snapshot. Just like with the gcQueue, the raftSnapshotQueue has a processing timeout of only 1 minute, which seems exceptionally low to me. When we need to send a snapshot to revive a struggling replica, I'd expect us to be liberal with our timeouts since this is the last strategy we have before abandoning slow replicas. I'm going to bump this timeout up to 10 minutes and see if that will help to revive this range. Let's hold off the restart necessary for this until the probing on r11844 completes though.

As with the first issue, I suspect the slow snapshots are due to engine-level slowdowns, possibly related to the tight looping MVCCFindSplitKey calls.

@nvanbenschoten

Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. When the context is expired, there is no point in shooting off another gazillion requests.

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. I think it'll be nontrivial to max out these budgets in practice, but I can definitely do it in intentionally evil tests, and it's good to know that there is some rudimentary form of memory accounting in this queue.

Fallout from cockroachdb#18199 and corresponding testing in cockroachdb#15997. When the context is expired, there is no point in shooting off another gazillion requests.

@nvanbenschoten

Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

nvanbenschoten · 2017-11-16T22:34:24Z

Closing this, since we've already identified the cause of the problems faced here and have opened focused issues to address each. Thanks for all the patience and help during this process @christian-lefty!

dianasaur323 added the community-questions label Sep 5, 2017

dianasaur323 added this to the 1.1 milestone Sep 5, 2017

petermattis assigned cuongdo Sep 5, 2017

cuongdo assigned tbg and nvanbenschoten and unassigned cuongdo and tbg Sep 5, 2017

petermattis mentioned this issue Sep 19, 2017

storage: avoid reading uncommitted tail of Raft log when becoming leader #18601

Closed

tbg mentioned this issue Sep 20, 2017

storage: add "noop" intent resolution poisoning option #18635

Closed

tbg mentioned this issue Sep 20, 2017

storage: short-circuit GC queue when context expires #18642

Merged

This was referenced Sep 20, 2017

storage: do not gather up too many intents/keys #18643

Closed

storage: investigate busy loop in split queue #18646

Closed

tbg mentioned this issue Sep 21, 2017

cherrypick-1.1: storage: short-circuit GC queue when context expires #18672

Merged

NathanRThomas mentioned this issue Oct 25, 2017

High CPU load on an idle cluster with context deadline exceeded error. #19216

Closed

bdarnell mentioned this issue Nov 15, 2017

core: command reproposed many times after cluster restart #20039

Closed

tbg mentioned this issue Nov 15, 2017

storage: optimize the GC queue #20052

Closed

4 tasks

nvanbenschoten closed this as completed Nov 16, 2017

jordanlewis added C-question A question rather than an issue. No code/spec/doc change needed. O-community Originated from the community and removed O-deprecated-community-questions labels Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading specific rows never returns #18199

reading specific rows never returns #18199

jcsdt commented Sep 4, 2017

nvanbenschoten commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

petermattis commented Sep 5, 2017

nvanbenschoten commented Sep 6, 2017

christian-lefty commented Sep 6, 2017

nvanbenschoten commented Sep 6, 2017

christian-lefty commented Sep 6, 2017

nvanbenschoten commented Sep 6, 2017

nvanbenschoten commented Sep 7, 2017

nvanbenschoten commented Sep 8, 2017

christian-lefty commented Sep 8, 2017 via email

bdarnell commented Sep 8, 2017

bdarnell commented Sep 8, 2017

nvanbenschoten commented Sep 8, 2017

jcsdt commented Sep 11, 2017

nvanbenschoten commented Sep 12, 2017

christian-lefty commented Sep 12, 2017

nvanbenschoten commented Sep 13, 2017 •

edited

Loading

petermattis commented Sep 18, 2017

bdarnell commented Sep 18, 2017

christian-lefty commented Sep 18, 2017

nvanbenschoten commented Sep 19, 2017

nvanbenschoten commented Sep 20, 2017

nvanbenschoten commented Nov 16, 2017

reading specific rows never returns #18199

reading specific rows never returns #18199

Comments

jcsdt commented Sep 4, 2017

nvanbenschoten commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

christian-lefty commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

nvanbenschoten commented Sep 5, 2017

petermattis commented Sep 5, 2017

nvanbenschoten commented Sep 6, 2017

christian-lefty commented Sep 6, 2017

nvanbenschoten commented Sep 6, 2017

christian-lefty commented Sep 6, 2017

nvanbenschoten commented Sep 6, 2017

nvanbenschoten commented Sep 7, 2017

nvanbenschoten commented Sep 8, 2017

christian-lefty commented Sep 8, 2017 via email

bdarnell commented Sep 8, 2017

bdarnell commented Sep 8, 2017

nvanbenschoten commented Sep 8, 2017

jcsdt commented Sep 11, 2017

nvanbenschoten commented Sep 12, 2017

christian-lefty commented Sep 12, 2017

nvanbenschoten commented Sep 13, 2017 • edited Loading

petermattis commented Sep 18, 2017

bdarnell commented Sep 18, 2017

christian-lefty commented Sep 18, 2017

nvanbenschoten commented Sep 19, 2017

nvanbenschoten commented Sep 20, 2017

nvanbenschoten commented Nov 16, 2017

nvanbenschoten commented Sep 13, 2017 •

edited

Loading