Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: ExportRequest poorly leveraged to clean up abandoned intents #59704

Closed
nvanbenschoten opened this issue Feb 2, 2021 · 18 comments · Fixed by #64131
Closed

storage: ExportRequest poorly leveraged to clean up abandoned intents #59704

nvanbenschoten opened this issue Feb 2, 2021 · 18 comments · Fixed by #64131
Assignees
Labels
A-disaster-recovery A-kv-transactions Relating to MVCC and the transactional model. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Feb 2, 2021

Currently, an ExportRequest will take an extremely long time to resolve a collection of abandoned intents. This can make it look like the ExportRequest is stuck somewhere in KV. The reason for this (or at least part of it) is that ExportMVCCToSst currently only returns a single intent if it returns a WriteIntentError. This is different than a request like ScanRequest, which will collect a set of intents in a WriteIntentError, which allows higher levels to process all of them at once.

We should improve ExportRequest to collect multiple intents in a single WriteIntentError.

We should also write a test that exercises the path where an ExportRequest needs to clean up millions of abandoned intents.

gz#7529

Epic: CRDB-2554

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery A-storage Relating to our storage engine (Pebble) on-disk storage. A-kv-transactions Relating to MVCC and the transactional model. labels Feb 2, 2021
@tbg
Copy link
Member

tbg commented Feb 2, 2021

We should also write a test that exercises the path where an ExportRequest needs to clean up millions of abandoned intents.

here's a random 2018 flashback for you #18661

@tbg
Copy link
Member

tbg commented Feb 2, 2021

(Nothing there is useful, just posting for entertainment)

@aayushshah15
Copy link
Contributor

aayushshah15 commented Feb 2, 2021

The support issue that this popped up on saw intents being resolved at a rate of less than 10/sec, is that right?

I'm wondering if you could give me any intuition around why resolving these intents one-by-one is so egregiously slow, as I recall that the cluster in question had all its nodes in the same region (or even the same AZ IIRC).

Pardon my over-simplification, but aren't we just talking about issuing a bunch of QueryIntent ResolveIntent requests?

@nvanbenschoten
Copy link
Member Author

My current working theory is that there is some quadratic behavior going on here. Each time, the ExportRequest begins scanning the range from the front and has to scan (while building SSTs) to the next abandoned intent. So we end up scanning the range num_abandoned_intents times. Collecting all intents in one go would avoid this.

@tbg
Copy link
Member

tbg commented Feb 8, 2021

Another option (on top of making intent handling more robust, which needs to happen anyway) could be a new request type (much like scan) which does not return any keys. Export could issue that ahead of its export requests to sanitize the keyspace. It is an extra step (which could be parallelized for much of the keyspace and wouldn't be very expensive with the separate lock table), but could further reduce the likelihood of the Export running into an intent in the first place.

@sumeerbhola
Copy link
Collaborator

My current working theory is that there is some quadratic behavior going on here.
...

Export could issue that ahead of its export requests to sanitize the keyspace. It is an extra step (which could be parallelized for much of the keyspace and wouldn't be very expensive with the separate lock table) ...

btw, this quadratic behavior was also discussed in #41720 (comment) (second bullet)

@nvanbenschoten
Copy link
Member Author

To be concrete here, the work item will be updating MVCCIncrementalIterator to collect multiple intents in a single WriteIntentError, instead of just a single intent. This is similar to how the pebbleMVCCScanner works. This is on the border of Bulk-I/O and Storage, so I'd appreciate guidance on which direction to route this.

@lunevalex
Copy link
Collaborator

cc: @mwang1026 for consideration for 21.2

@tbg
Copy link
Member

tbg commented Feb 10, 2021 via email

@mwang1026
Copy link

I'd selfishly like to fast-track this as well since as far as I can tell doesn't have clear mitigation steps and also isn't easy for a customer to self-diagnose. Who could we assign this to / what would process be for fast tracking? Do we consider this a bug?

@aayushshah15
Copy link
Contributor

Spoke to @nvanbenschoten offline, I'll take this on my plate.

@aayushshah15 aayushshah15 self-assigned this Feb 10, 2021
@dt
Copy link
Member

dt commented Feb 10, 2021

I think this is a dupe of #31762 but since it now has more discussion on it, maybe we close that one instead

@nvanbenschoten
Copy link
Member Author

Something I found while exploring #60585 was that on release-20.2, we do not consult the finalizedTxnCache when req.WaitPolicy == lock.WaitPolicy_Error in lockTableWaiterImpl.WaitOn (here). I think we'll want to for two reasons:

  1. it will avoid a transaction record push per abandoned intent.
  2. it will allow us to defer and batch intent resolution like we do here.

So while addressing this issue, we'll want to make this change as well on release-20.2. We won't need it on master, as the finalizedTxnCache was pulled into the lockTable by @sumeerbhola in #57947.

Ideally, the testing we'll add when addressing this issue will be comprehensive enough to have caught this difference between master and release-20.2. For this to be true, we'll likely want at least two new tests:

  1. an integration test that shows that BACKUP will efficiently clean up large swaths of abandoned intents
  2. a new test scenario in pkg/kv/kvserver/concurrency/testdata/concurrency_manager/wait_policy_error, similar to the "request resolves the abandoned lock and proceeds" scenario, that shows that abandoned lock cleanup is deferred and batched.

@joshimhoff
Copy link
Collaborator

joshimhoff commented Apr 19, 2021

Yay to this issue. Can we backport the fix? What CRDB versions will the fix eventually land in? Us SREs are pretty motivated to roll this out fleet wide as early as possible.

Does this help with BACKUP in addition to bulk IO ops like IMPORT / EXPORT?

@nvanbenschoten
Copy link
Member Author

Can we backport the fix?

Yes, we intend to backport this fix to v21.1, v20.2, and v20.1.

Does this help with BACKUP in addition to bulk IO ops like IMPORT / EXPORT?

Yep! This will help with any operation that uses ExportRequest, which includes BACKUP.

@joshimhoff
Copy link
Collaborator

Amazing.

@aliher1911
Copy link
Contributor

It turned out that 20.1 is not as straightforward. We use rocksdb as a default storage and the fix lives in the glue code between storage engine and layer above so it only applies to pebble. So unless there's read need for that, we'd rather encourage people to upgrade. As for backporting change to pebble storage in 20.1 it is possible and the pr exists but not sure if we should proceed or not.

@nvanbenschoten
Copy link
Member Author

As for backporting change to pebble storage in 20.1 it is possible and the pr exists but not sure if we should proceed or not.

I don't think we should bother with this. From the sound of it, we don't have any more scheduled v20.1 patch releases. We also don't expect many v20.1 clusters to be using Pebble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-kv-transactions Relating to MVCC and the transactional model. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants