Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
39271: importccl: Remove pre-buffering stage from direct ingest IMPORT r=dt a=adityamaru27 This change removes the pre-buffering step in the direct ingest IMPORT code path. Previously, we would create separate buckets for each table's primary data, and when the bucket would be full we would flush it to the BulkAdder. Running an import on 3 nodes of tpcc 1K OOM'ed as a result of this buffer. Two big wins we got from this pre-buffering stage were: 1. We avoided worst case overlapping behavior in the AddSSTable calls as a result of flushing keys with the same TableIDIndexID prefix, together. 2. Secondary index KVs which were few and filled the bucket infrequently were flushed only a few times, resulting in fewer L0 (and total) files. In order to resolve this OOM, we decided to take advantage of the split keys we insert across AllIndexSpans of each table during IMPORT. Since the BulkAdder is split aware and does not allow SSTables to span across splits, we already achieve the non-overlapping property we strive for (as mentioned above). The downside is we lose the second win, as the KVs fed to the BulkAdder are now ungrouped. This results in larger number of smaller SSTs being flushed, causing a spike in L0 and total number of files, but overall less memory usage. This change also ENABLES the `import/experimental-direct-ingestion` roachtest. TODO: Currently experimenting using two adders, one for primary indexes and one for secondary indexes. This helps us achieve the second win as well. Will have a follow up PR once the roachtest stabilizes. Release note: None Co-authored-by: Aditya Maru <[email protected]>
- Loading branch information