importccl: Direct-ingest uses two bulk adders instead of one. #39424

adityamaru27 · 2019-08-07T21:10:23Z

This is another change to stabilize direct ingest import before
it is made the default.
As a consequence of #39271, the number of files (L0 and total),
along with the cumulative compaction size increased drastically.
A consequence of no longer creating buckets of TableIDIndexID
before flushing is that the single bulk adder would receive a
mix of primary and secondary index entries. Since SSTs cannot
span across the splits we inserted between index spans, it would
create numerous, small secondary index SSTs along with the
bigger primary index SSTs, and flush on reaching its limit
(which would be often).

By introducing two adders, one for ingesting primary index data,
and the other for ingesting secondary index data we regain the
ability to make fewer, bigger secondary index SSTs and flush less
often. The peak mem is lower than what prebuffering used to
hit, while the number of files (L0 and total), and the cumulative
compaction size return to prebuffering levels.

Some stats below for a tpcc 1k, on a 1 node cluster.

With prebuffering:
Total Files : 7670
L0 Files : 1848
Cumulative Compaction (GB): 24.54GiB

Without prebuffering, one adder:
Total Files : 22420
L0 Files : 16900
Cumulative Compaction (GB): 52.43 GiB

Without prebuffering, two adders:
Total Files : 6805
L0 Files : 1078
Cumulative Compaction (GB): 18.89GiB

Release note: None

cockroach-teamcity · 2019-08-07T21:10:30Z

This change is

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)

pkg/ccl/importccl/read_import_proc.go, line 390 at r1 (raw file):

				return err
			}
			pkIndexAdder.SetName("pkIndexAdder")

Not wild about the number of Set functions and mutable fields this thing is acquiring. I realize adding every single flag to the constructor is annoying given how that interface has to be maintained in sql, distsql and server, but I wonder if it is time to make a storagebase.BulkConfig{} struct we could pass to the constructor, and then just update that for every new flag.

pkg/ccl/importccl/read_import_proc.go, line 407 at r1 (raw file):

			pkAdded := pkIndexAdder.GetSummary()
			indexAdded := indexAdder.GetSummary()
			cumulativeAdded := &roachpb.BulkOpSummary{

there's a BulkOpSummary.Add(BulkOpSummary) method.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)

pkg/ccl/importccl/read_import_proc.go, line 390 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Not wild about the number of Set functions and mutable fields this thing is acquiring. I realize adding every single flag to the constructor is annoying given how that interface has to be maintained in sql, distsql and server, but I wonder if it is time to make a storagebase.BulkConfig{} struct we could pass to the constructor, and then just update that for every new flag.

I forgot to mention that this does not need to block this PR -- just might be a nice followup.

This is another change to stabilize direct ingest import before it is made the default. As a consequence of cockroachdb#39271, the number of files (L0 and total), along with the cumulative compaction size increased drastically. A consequence of no longer creating buckets of TableIDIndexID before flushing is that the single bulk adder would receive a mix of primary and secondary index entries. Since SSTs cannot span across the splits we inserted between index spans, it would create numerous, small secondary index SSTs along with the bigger primary index SSTs, and flush on reaching its limit (which would be often). By introducing two adders, one for ingesting primary index data, and the other for ingesting secondary index data we regain the ability to make fewer, bigger secondary index SSTs and flush less often. The peak mem is lower than what prebuffering used to hit, while the number of files (L0 and total), and the cumulative compaction size return to prebuffering levels. Some stats below for a tpcc 1k, on a 1 node cluster. With prebuffering: Total Files : 7670 L0 Files : 1848 Cumulative Compaction (GB): 24.54GiB Without prebuffering, one adder: Total Files : 22420 L0 Files : 16900 Cumulative Compaction (GB): 52.43 GiB Without prebuffering, two adders: Total Files : 6805 L0 Files : 1078 Cumulative Compaction (GB): 18.89GiB Release note: None

adityamaru27

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)

pkg/ccl/importccl/read_import_proc.go, line 407 at r1 (raw file):

Previously, dt (David Taylor) wrote…

there's a BulkOpSummary.Add(BulkOpSummary) method.

Done.

adityamaru27

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt)

pkg/ccl/importccl/read_import_proc.go, line 390 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I forgot to mention that this does not need to block this PR -- just might be a nice followup.

ack

dt

Reviewable status: complete! 0 of 0 LGTMs obtained

adityamaru27 · 2019-08-08T15:33:58Z

TFTR!
bors r+

39418: stats: truncate large datums when sampling for histogram r=rytaft a=rytaft This commit adds logic to truncate long bit arrays, byte strings, strings, and collated strings during sampling for histogram creation. We do this to avoid using excessive memory or disk space during sampling and storage of the final histogram. Release note: None 39424: importccl: Direct-ingest uses two bulk adders instead of one. r=adityamaru27 a=adityamaru27 This is another change to stabilize direct ingest import before it is made the default. As a consequence of #39271, the number of files (L0 and total), along with the cumulative compaction size increased drastically. A consequence of no longer creating buckets of TableIDIndexID before flushing is that the single bulk adder would receive a mix of primary and secondary index entries. Since SSTs cannot span across the splits we inserted between index spans, it would create numerous, small secondary index SSTs along with the bigger primary index SSTs, and flush on reaching its limit (which would be often). By introducing two adders, one for ingesting primary index data, and the other for ingesting secondary index data we regain the ability to make fewer, bigger secondary index SSTs and flush less often. The peak mem is lower than what prebuffering used to hit, while the number of files (L0 and total), and the cumulative compaction size return to prebuffering levels. Some stats below for a tpcc 1k, on a 1 node cluster. With prebuffering: Total Files : 7670 L0 Files : 1848 Cumulative Compaction (GB): 24.54GiB Without prebuffering, one adder: Total Files : 22420 L0 Files : 16900 Cumulative Compaction (GB): 52.43 GiB Without prebuffering, two adders: Total Files : 6805 L0 Files : 1078 Cumulative Compaction (GB): 18.89GiB Release note: None Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Aditya Maru <[email protected]>

craig · 2019-08-08T16:23:24Z

Build succeeded

GitHub CI (Cockroach)

adityamaru27 requested a review from dt August 7, 2019 21:10

dt approved these changes Aug 8, 2019

View reviewed changes

adityamaru27 force-pushed the import-experiments branch from 594cc63 to e0d214f Compare August 8, 2019 14:43

adityamaru27 commented Aug 8, 2019

View reviewed changes

dt approved these changes Aug 8, 2019

View reviewed changes

craig bot merged commit e0d214f into cockroachdb:master Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importccl: Direct-ingest uses two bulk adders instead of one. #39424

importccl: Direct-ingest uses two bulk adders instead of one. #39424

adityamaru27 commented Aug 7, 2019

cockroach-teamcity commented Aug 7, 2019

dt left a comment

dt left a comment

adityamaru27 left a comment

adityamaru27 left a comment

dt left a comment

adityamaru27 commented Aug 8, 2019

craig bot commented Aug 8, 2019

importccl: Direct-ingest uses two bulk adders instead of one. #39424

importccl: Direct-ingest uses two bulk adders instead of one. #39424

Conversation

adityamaru27 commented Aug 7, 2019

cockroach-teamcity commented Aug 7, 2019

dt left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

adityamaru27 left a comment

Choose a reason for hiding this comment

adityamaru27 left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

adityamaru27 commented Aug 8, 2019

craig bot commented Aug 8, 2019

Build succeeded