Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk: pre-split more before ingesting unsorted data #74826

Closed
dt opened this issue Jan 13, 2022 · 1 comment · Fixed by #74816
Closed

bulk: pre-split more before ingesting unsorted data #74826

dt opened this issue Jan 13, 2022 · 1 comment · Fixed by #74816
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery

Comments

@dt
Copy link
Member

dt commented Jan 13, 2022

It has been observed that during a backfill or import, the split-as-we-go behavior used by the sst batcher, where it only explicitly splits and scatters after sending some amount of data all to one range, does not send any splits in the steady state when importing out of order data: since it is out of order, after chunking it by destination range, no one chunk is large enough to be considered filling that range enough to cause us to send a split. Of course, the range may fill on its own, as many processors send it small chunks over and over and those add up, but no one processor knows when that happens, so we simply leave it up to the range to split itself when it is full as it would in any other heavy write-rate, non-bulk ingestion case.

However, we've seen this lead to hotspots during bulk ingestion. Until the range becomes full and decides to split, it can be a bottleneck as many processors send it data at once. Existing mechanisms for kv-driven load-based splitting are not tuned for ingestion workloads. When it does decide to split and rebalance, that is a more expensive operation, as we're now needing to read, move and ingest half of that full range that we just spend so much work sending data to and ingesting, all while still being sent more ingestion load.

Ideally, we'd prefer to split the span before we start ingesting, both to spread ingestion load over our available capacity better, and to reduce how much ingested data it will need to shuffle if we waited to split and scatter only once a range filled. By scattering first, we'll just ingest directly to the right place initially. However a challenge in doing this is that in many cases, we don't know what data we'll be ingesting until we start reading input and producing it -- it could be that we're working off of a key-ordered CSV, or it could be that we have a uniform random distribution. In some cases, such as an index backfill, we may be able to use some external information like a SQL statistics or a sampling scan of the table to derive a good partitioning, but in others, like IMPORTs or view materialization, we have no way to know what the data will be until we start producing it.

However we can do a bit better than not pre-splitting at all though, by instead using the first buffer's worth of data, if it was read out of sorted order, as if it is a representative sample of the rest of the data to come and then generating splits from it and splitting and scatter the target spans using those before proceeding to flush. If this buffer turns out not to be a good sample after all, this is no worse than before, where we pre-split not at all. If the input data was sorted, this step is unnecessary as we'll just split as we go, after each range we fill and scatter the empty remainder before moving to fill it, as we already do today.

Epic: CRDB-2340

@dt dt added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 13, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jan 13, 2022

cc @cockroachdb/bulk-io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant