importccl: prototype support for IMPORT INTO #37451

dt · 2019-05-10T15:13:42Z

This adds a prototype of incremental IMPORT, allowing importing CSV data
into an existing table as opposed to only into as new table with current
IMPORT.

Unlike traditional IMPORT which takes a specification of the table to
create, this takes a reference to an existing table into which it will
import data. Initially only CSV data, importing into a single table, is
supported (the SQL dumpfiles are typically dumps of an entire table so
it seems likess likely that we need to support them here for now).

Since the actual bulk ingestion is done via non-transactional AddSSTable
commands, the table must be taken offline during ingestion. The IMPORT
job begins by schema-changing the table to an offline 'IMPORTING' state
that should prevent leasing it and moves it back to public when it
finishes (on success or failure, unlike a newly table created table
which is usually rolled back via a drop on failure).

This prototype, and as such has many unfinshed pieces (some of which are
captured as TODOs). Given the number of unresolved UX questions
around this feature, we wanted to start playing with something concrete
to help guide the feature specification and further development.

A few of the more significant areas to be resolved are how the hook uses
transactions for table creation, resolution and alteration versus the
job creation, resolving, checking and plumbing specified subsets of the
columns to IMPORT into, handling rollbacks of partial ingestion,
updating job status messages, and a careful audit of everywhere that
table descriptors are acquired to ensure the IMPORTING state is handled
correctly. That said, these are mostly seperable issues that make better
standalone follow-up changes (and can be done in parallel)

Release note: none.

cockroach-teamcity · 2019-05-10T15:13:53Z

This change is

maddyblue

What's here looks good, but I'm a bit confused by how partial it is. I don't recall the importStmt.Into field existing. Is that new in this work or did that get added a while ago and I forget? Should there be tests here? If you are going to do this in small pieces where it won't actually work until the end then LGTM and merge away.

thoszhang · 2019-05-13T20:41:48Z

pkg/ccl/importccl/import_stmt.go

+			importing.Version++
+			if err := p.ExecCfg().DB.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
+				return errors.Wrap(
+					txn.CPut(ctx, sqlbase.MakeDescMetadataKey(found.TableDescriptor.ID),


how does this interact with other schema changes running on the table?

the cput of the whole desc from before to after should avoid any concurrent writes, but after that we're not yet doing anything to actively lock-out other schema changes. We'll need to.

maddyblue · 2019-05-13T20:45:22Z

pkg/ccl/importccl/import_stmt.go

-			if err != nil {
+			// Take the table offline for import.
+			importing := found.TableDescriptor
+			importing.State = sqlbase.TableDescriptor_IMPORTING


What does this do to the table? Users cannot read or write it during this time?

yeah, similar to ADD state: can't lease it in SQL code so no reads or writes.

dt · 2019-05-15T13:17:02Z

@mjibson certainly partial: for now, trying to get a functioning skeleton in and then we'll fill in the gaps in standalone changes, but we want to get a prototype into the hands of product or design partners in time to collect feedback early in the development.

This adds a prototype of incremental IMPORT, allowing importing CSV data into an existing table as opposed to only into as new table with current IMPORT. Unlike traditional IMPORT which takes a specification of the table to create, this takes a reference to an existing table into which it will import data. Initially only CSV data, importing into a single table, is supported (the SQL dumpfiles are typically dumps of an entire table so it seems likess likely that we need to support them here for now). Since the actual bulk ingestion is done via non-transactional AddSSTable commands, the table must be taken offline during ingestion. The IMPORT job begins by schema-changing the table to an offline 'IMPORTING' state that should prevent leasing it and moves it back to public when it finishes (on success or failure, unlike a newly table created table which is usually rolled back via a drop on failure). Release note: none.

dt · 2019-05-21T13:28:50Z

bors r+

37451: importccl: prototype support for IMPORT INTO r=dt a=dt This adds a prototype of incremental IMPORT, allowing importing CSV data into an existing table as opposed to only into as new table with current IMPORT. Unlike traditional IMPORT which takes a specification of the table to create, this takes a reference to an existing table into which it will import data. Initially only CSV data, importing into a single table, is supported (the SQL dumpfiles are typically dumps of an entire table so it seems likess likely that we need to support them here for now). Since the actual bulk ingestion is done via non-transactional AddSSTable commands, the table must be taken offline during ingestion. The IMPORT job begins by schema-changing the table to an offline 'IMPORTING' state that should prevent leasing it and moves it back to public when it finishes (on success or failure, unlike a newly table created table which is usually rolled back via a drop on failure). This prototype, and as such has many unfinshed pieces (some of which are captured as TODOs). Given the number of unresolved UX questions around this feature, we wanted to start playing with something concrete to help guide the feature specification and further development. A few of the more significant areas to be resolved are how the hook uses transactions for table creation, resolution and alteration versus the job creation, resolving, checking and plumbing specified subsets of the columns to IMPORT into, handling rollbacks of partial ingestion, updating job status messages, and a careful audit of everywhere that table descriptors are acquired to ensure the IMPORTING state is handled correctly. That said, these are mostly seperable issues that make better standalone follow-up changes (and can be done in parallel) Release note: none. Co-authored-by: David Taylor <[email protected]>

craig · 2019-05-21T14:12:19Z

Build succeeded

GitHub CI (Cockroach)

dt requested review from maddyblue, thoszhang, vivekmenezes and a team May 10, 2019 15:13

dt force-pushed the import-into branch from ec82f32 to d074ad6 Compare May 11, 2019 03:02

maddyblue reviewed May 13, 2019

View reviewed changes

thoszhang reviewed May 13, 2019

View reviewed changes

maddyblue reviewed May 13, 2019

View reviewed changes

dt force-pushed the import-into branch 2 times, most recently from 6cb1ca6 to 8e0eef7 Compare May 15, 2019 13:12

maddyblue approved these changes May 15, 2019

View reviewed changes

thoszhang approved these changes May 20, 2019

View reviewed changes

dt force-pushed the import-into branch from 8e0eef7 to d149ed4 Compare May 21, 2019 01:17

nstewart mentioned this pull request May 21, 2019

importccl: Allow users to IMPORT into an existing table #26834

Closed

15 tasks

craig bot merged commit d149ed4 into cockroachdb:master May 21, 2019

lnhsingh mentioned this pull request Jun 26, 2019

Incremental IMPORT (append to existing tables) cockroachdb/docs#4744

Closed

dt deleted the import-into branch June 28, 2019 12:43

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importccl: prototype support for IMPORT INTO #37451

importccl: prototype support for IMPORT INTO #37451

dt commented May 10, 2019 •

edited

Loading

cockroach-teamcity commented May 10, 2019

maddyblue left a comment

thoszhang May 13, 2019

dt May 15, 2019

maddyblue May 13, 2019

dt May 14, 2019

dt commented May 15, 2019

dt commented May 21, 2019

craig bot commented May 21, 2019

importccl: prototype support for IMPORT INTO #37451

importccl: prototype support for IMPORT INTO #37451

Conversation

dt commented May 10, 2019 • edited Loading

cockroach-teamcity commented May 10, 2019

maddyblue left a comment

Choose a reason for hiding this comment

thoszhang May 13, 2019

Choose a reason for hiding this comment

dt May 15, 2019

Choose a reason for hiding this comment

maddyblue May 13, 2019

Choose a reason for hiding this comment

dt May 14, 2019

Choose a reason for hiding this comment

dt commented May 15, 2019

dt commented May 21, 2019

craig bot commented May 21, 2019

Build succeeded

dt commented May 10, 2019 •

edited

Loading