importccl: [WIP] support nextval and currval as default expression #52910

Anzoteh96 · 2020-08-17T16:49:23Z

This PR follows up from #50295 in supporting default expressions, but for the function nextval() and currval().

(This is WIP and I'll think of the description later on.)

cockroach-teamcity · 2020-08-17T16:49:34Z

This change is

Anzoteh96

Tagging @cockroachdb/bulk-prs for an early review (the skeleton is ready, but the details might not be fully fleshed).

So the whole task is broken down into two (focusing on nextval)

Ask for chunks, store the chunk as part of evalCtx annotations. When asking for more chunks we'll need to update the status of the sequence value, which is covered kv.IncrementRetryable.
These chunks have to be communicated back and forth during job progress saving. Therefore, I've stored this information (I wrapped it under DefaultValueMetaData just in case there are other metadata needed to store as we support more default expressions for import later on) as part of ImportProgress, ReadImportDataSpec, and RemoteProducerMetaData. Here's how I think it should work (bottom line: after those things the same time as we update the resume position, or last row).

-> At the start of import, importProgress relays information on chunks (DefaultValueMetaData) (along with resume position) to ReadImportDataSpec.
-> The DefaultValueMetaData from ReadImportDataSpec is then relayed to importCtx, which will then be relayed to DatumRowConverter.
-> As we encounter more instances of nextval, these metadata is updated (with more chunks being written).
-> This metadata is periodically stored to KVBatches (together with resume position).
-> As we update progress (with involves updating resume pos), we need to send these chunks from KVBatches to RemoteProducerMetaData.
-> Finally, RemoteProducerMetaData relays this information back to ImportProgress.

More details in comments below.

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/ccl/importccl/import_processor.go, line 154 at r3 (raw file):

		return newCSVInputReader(
			kvCh, spec.Format.Csv, spec.WalltimeNanos, int(spec.ReaderParallelism),
			singleTable, singleTableTargetCols, evalCtx, spec.DefaultExprMetaData), nil

Note: DefaultExprMetaData is used as an argument for for creating input readers so that this information can be stored in importCtx. I haven't done that for dump file formats (where creation of row converter happens in the newReader function), though.

pkg/ccl/importccl/import_processor.go, line 262 at r3 (raw file):

		for i, emitted := range writtenRow {
			atomic.StoreInt64(&pkFlushedRow[i], emitted)
			pkDefaultExprMetaData[i] = tempDefaultExprMetaData[i]

Here's something that troubles me: atomic can be used to store the flushed rows, but defaultExprMetaData is a (rather complicated) struct and cannot be directly used here. How should we approach this?

pkg/ccl/importccl/read_import_base.go, line 72 at r3 (raw file):

	evalCtx := flowCtx.NewEvalCtx()
	evalCtx.DB = flowCtx.Cfg.DB

Getting this DB populated here as we need this to create ID-to-SequenceTable correspondence in rowConverter.

pkg/jobs/jobspb/jobs.proto, line 228 at r3 (raw file):

  sequenceChunkMap sequenceMap = 1;
}

Here's another thing that I think it's worth grabbing attention: goproto doesn't seem to have a facility to do map<int32, repeated chunkInfo> (or anything like that) so whenever I have to refer to a chunk in a defaultExprMetaData I have to do

defMetaData.sequenceMap.Chunks[file_id].Chunks[chunk_index]
which seems unnecessarily long.

pkg/sql/distsql_plan_csv.go, line 291 at r3 (raw file):

			}
			for i, v := range meta.BulkProcessorProgress.DefaultExprMetaData {
				defaultExprMetaData[i] = v

Again this is a place I wish there's an atomic operation available.

pkg/sql/rowexec/bulk_row_writer.go, line 104 at r3 (raw file):

	var g ctxgroup.Group

	// TODO (anzoteh96): try populating defaultValueMetaData.

Note: here I'm not too sure what would the defaultValueMetaData be.

Anzoteh96

Here's an update on the state of this PR:

The tests pass locally, but somehow on CI it fails on TestImportDefaultWithResume/nextval with 10 extra rows so I think it's worth looking for.
Something worth fixing is how do we effectively communicate the defaultExprMetaData(which contains the sequence chunks) between the job progress struct and the actually import job itself (say, rowConverter). Same goes to the importJob itself. The current way is to populate them as arguments when calling rowConverter but this won't be a great idea for the following reasons:
-- (a) some other places call these functions, either newDatumRowConverter or rowConverter.Row().
-- (b) for dump file reader / converter, I can't seem to find a good way to populate these arguments (though, arguably, for the dump files these nextvalvalues have been automatically produced within the dump files itself.

More comments below.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @pbardea)

pkg/ccl/importccl/import_processor.go, line 171 at r4 (raw file):

		return newAvroInputReader(
			kvCh, singleTable, spec.Format.Avro, spec.WalltimeNanos,
			int(spec.ReaderParallelism), evalCtx, spec.DefaultExprMetaData, job)

Here the DefaultExprMetaData and job are populated for non-dump files but not dump files.

pkg/ccl/importccl/read_import_base.go, line 418 at r4 (raw file):

	kvCh                 chan row.KVBatch                      // Channel for sending KV batches.
	job                  *jobs.Job                             // Job (of this import) to be populated into row converter.
	defaultValueMetaData map[int32]*jobspb.DefaultExprMetaData // Metadata of default values to be communicated.

Here, we put both job and defaultValueMetaData as fields for parallelImportContext where the non-dump files will use.

pkg/ccl/importccl/read_import_workload.go, line 222 at r4 (raw file):

// This worker needs its own EvalContext and DatumAlloc.
func (w *WorkloadKVConverter) Worker(ctx context.Context, evalCtx *tree.EvalContext) error {
	conv, err := row.NewDatumRowConverter(ctx, w.tableDesc, nil /* targetColNames */, evalCtx, nil, w.kvCh)

This is another place where we have to populate nil for defaultExprMetaData -> unless there's an efficient way to retrieve that information.

pkg/sql/row/expr_walker.go, line 238 at r4 (raw file):

				return nil, errors.Newf(
					"job needs to be injected into evalCtx for allocation of chunks")
			}

Just some thought: are we supposed to be able to process a non-targeted column with default expression nextval without the presence of a job (say, during an import?)

pkg/sql/row/row_converter.go, line 436 at r4 (raw file):

// if necessary.
func (c *DatumRowConverter) Row(
	ctx context.Context, sourceID int32, rowIndex int64, job *jobs.Job,

Note: job information is populated for every row.

pbardea · 2020-12-23T21:15:42Z

These changes went in with #56473. Thanks again!

Anzoteh96 force-pushed the import-nextval branch from b52df26 to 1e9aa59 Compare August 17, 2020 18:39

Anzoteh96 commented Aug 18, 2020

View reviewed changes

Anzoteh96 requested review from a team and dt and removed request for a team August 18, 2020 13:36

pbardea self-requested a review August 18, 2020 15:39

Anzoteh96 force-pushed the import-nextval branch 2 times, most recently from 5d38ab4 to a69329b Compare August 20, 2020 15:48

importccl: support nextval as default expression

faef143

Anzoteh96 force-pushed the import-nextval branch from 8fae480 to faef143 Compare August 20, 2020 21:42

Anzoteh96 commented Aug 20, 2020

View reviewed changes

Anzoteh96 mentioned this pull request Sep 25, 2020

import: support import with default next_val columns #54797

Closed

adityamaru mentioned this pull request Nov 10, 2020

importccl: add nextval support for IMPORT INTO CSV #56473

Merged

pbardea closed this Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importccl: [WIP] support nextval and currval as default expression #52910

importccl: [WIP] support nextval and currval as default expression #52910

Anzoteh96 commented Aug 17, 2020

cockroach-teamcity commented Aug 17, 2020

Anzoteh96 left a comment

Anzoteh96 left a comment

pbardea commented Dec 23, 2020

importccl: [WIP] support nextval and currval as default expression #52910

importccl: [WIP] support nextval and currval as default expression #52910

Conversation

Anzoteh96 commented Aug 17, 2020

cockroach-teamcity commented Aug 17, 2020

Anzoteh96 left a comment

Choose a reason for hiding this comment

Anzoteh96 left a comment

Choose a reason for hiding this comment

pbardea commented Dec 23, 2020