-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: add support for importing computed columns #51321
Conversation
425d075
to
7b10293
Compare
Would this add support for the |
I'm not really sure how But that might also require some fixing in getting the targeted columns for the |
7b10293
to
460b00a
Compare
@bobvawter I just had a chance to revisit this PR again, it seems like |
The main use case that I'm seeing for (enterprise) customers who want to test-drive CRDB with their existing schemas. I think that it's reasonable to convert |
66ee2df
to
71f87f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96 and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3314 at r1 (raw file):
Quoted 30 lines of code…
}, { "name": "b", "type": "int", }, }, } schemaStr, err := json.Marshal(schema) require.NoError(t, err) codec, err := goavro.NewCodec(string(schemaStr)) require.NoError(t, err) // Create an AVRO writer from the schema. ocf, err := goavro.NewOCFWriter(goavro.OCFConfig{ W: &data, Codec: codec, }) require.NoError(t, err) row1 := map[string]interface{}{ "a": 1, "b": 2, } row2 := map[string]interface{}{ "a": 3, "b": 4, } // Add the data rows to the writer. require.NoError(t, ocf.Append([]interface{}{row1, row2})) // Retrieve the AVRO encoded data. avroData = data.String() }
Have you see avroGen interface in read_import_avro_test?
pkg/sql/sqlbase/default_exprs.go, line 114 at r1 (raw file):
// ProcessDefaultComputedColumns adds columns with DEFAULT or COMPUTED // to cols if not present, and returns those expressions.
i think it's not a great idea to add syntactic sugar utility functions like this (and the one above).
I think the code is much clearner if you simply call the underlying MakeDefaultExpr with apprpriate column filter yourself, at the call site. Alternatively, you can just define these functions in the file you use them.
71f87f9
to
600ff74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3314 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
}, { "name": "b", "type": "int", }, }, } schemaStr, err := json.Marshal(schema) require.NoError(t, err) codec, err := goavro.NewCodec(string(schemaStr)) require.NoError(t, err) // Create an AVRO writer from the schema. ocf, err := goavro.NewOCFWriter(goavro.OCFConfig{ W: &data, Codec: codec, }) require.NoError(t, err) row1 := map[string]interface{}{ "a": 1, "b": 2, } row2 := map[string]interface{}{ "a": 3, "b": 4, } // Add the data rows to the writer. require.NoError(t, ocf.Append([]interface{}{row1, row2})) // Retrieve the AVRO encoded data. avroData = data.String() }
Have you see avroGen interface in read_import_avro_test?
I just looked into the avroGen
interface: I could incorporate using the testHelper
interface to get the schema up, but there are still more work to be done here in order to get the actual avroData that includes the row data.
pkg/sql/row/row_converter.go, line 417 at r2 (raw file):
cols []sqlbase.ColumnDescriptor, tableDesc *sqlbase.ImmutableTableDescriptor, ) []sqlbase.ColumnDescriptor { colIDSet := make(map[sqlbase.ColumnID]struct{}, len(cols))
Note: this is where getRelevantColumns
is called which is basically sqlbase.processColumnSet
except for the special case of Default + Computed columns.
pkg/sql/sqlbase/default_exprs.go, line 114 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
i think it's not a great idea to add syntactic sugar utility functions like this (and the one above).
I think the code is much clearner if you simply call the underlying MakeDefaultExpr with apprpriate column filter yourself, at the call site. Alternatively, you can just define these functions in the file you use them.
Unfortunately processColumnSet
is not exported by sqlbase
. My getaround here would be to rewrite a processColumnSet
-ish function in row_converter.go
(known as getRelevantColumns
) and then call MakeDefaultExpr
from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 3 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96 and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3314 at r1 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
I just looked into the
avroGen
interface: I could incorporate using thetestHelper
interface to get the schema up, but there are still more work to be done here in order to get the actual avroData that includes the row data.
Just bringing this up to your attention -- if you can use it, great; if not, ohh well.
pkg/sql/sqlbase/default_exprs.go, line 114 at r1 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
Unfortunately
processColumnSet
is not exported bysqlbase
. My getaround here would be to rewrite aprocessColumnSet
-ish function inrow_converter.go
(known asgetRelevantColumns
) and then callMakeDefaultExpr
from there.
Is it possible to make it exported? Or perhaps just add exported version of that method with a todo to refactor callers of the unexported one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96 and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3357 at r2 (raw file):
CREATE TABLE users (a INT, b INT, c INT AS (a + b) STORED); INSERT INTO users (a, b) VALUES (1, 2), (3, 4); `
Do we ever actually expect PGDUMP data to look like this? I would imagine any data exported from PGDUMP would have the value already populated in the computed column.
pkg/sql/row/row_converter.go, line 658 at r2 (raw file):
&txCtx, c.EvalCtx, &semaCtx, true)
Can you add an /* addingCols */
comment on this param?
pkg/sql/row/row_converter.go, line 662 at r2 (raw file):
return errors.Wrapf(err, "error evaluating computed expression for IMPORT INTO") } computedCols := make([]sqlbase.ColumnDescriptor, 0)
Is there a reason we need to change this? It seems like we'll be allocating an extra slice in cases where we don't have computed exrs.
600ff74
to
6b6eee7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3314 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Just bringing this up to your attention -- if you can use it, great; if not, ohh well.
I'm using a different method now: use a helper function createAvroData
to create the data for me.
pkg/ccl/importccl/import_stmt_test.go, line 3357 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Do we ever actually expect PGDUMP data to look like this? I would imagine any data exported from PGDUMP would have the value already populated in the computed column.
Hmm I'm not sure...let me look into it.
pkg/sql/row/row_converter.go, line 658 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Can you add an
/* addingCols */
comment on this param?
Done (interesting how Reviewable swallows this change but it's reflected here https://github.com/cockroachdb/cockroach/pull/51321/files )
pkg/sql/row/row_converter.go, line 662 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Is there a reason we need to change this? It seems like we'll be allocating an extra slice in cases where we don't have computed exrs.
Done.
pkg/sql/sqlbase/default_exprs.go, line 114 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Is it possible to make it exported? Or perhaps just add exported version of that method with a todo to refactor callers of the unexported one?
I'm making it exported for now.
efbe217
to
65ffd46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3413 at r3 (raw file):
format: "PGDUMP", expectedResults: [][]string{{"1", "2", "3"}, {"3", "4", "7"}}, },
Tests with both default and compute columns would also be interesting.
Also, reordering the target cols is also a good test case.
pkg/ccl/importccl/read_import_csv.go, line 140 at r3 (raw file):
if col.IsComputed() { return nil, errors.Newf("%q is computed, which is not supported by IMPORT INTO CSV", col.Name)
Should this be a cannot write to computed column error? Are there plans for addressing the TODO here? I think that it's fairly important for that to be addressed before merging.
pkg/sql/row/row_converter.go, line 125 at r3 (raw file):
// columns, all the columns which could possibly be referenced *are* // available. if !computedCols[i].IsComputed() {
Why do we need this? Shouldn't we not be marking non-computed columns as computed?
pkg/sql/row/row_converter.go, line 389 at r3 (raw file):
colsForCompute := make([]sqlbase.ColumnDescriptor, len(c.tableDesc.Columns)) for _, col := range c.tableDesc.Columns { colsForCompute[c.computedIVarContainer.Mapping[col.ID]] = col
How does the ordering of colsForCompute
differ from c.tableDesc.Columns
. This probably warrants a comment since it isn't immediately clear.
I'm also not sure that colsForCompute
is the right name here since this looks like it's just a different ordering of all of the table columns?
pkg/sql/row/row_converter.go, line 407 at r3 (raw file):
var computedCols []sqlbase.ColumnDescriptor if len(computeExprs) > 0 { computedCols = colsForCompute
Is this what we want? I would expect computedCols
to just be the columns that are computed.
65ffd46
to
b84ad49
Compare
fd3eab1
to
73f3ffe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 3413 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Tests with both default and compute columns would also be interesting.
Also, reordering the target cols is also a good test case.
Done.
pkg/ccl/importccl/read_import_csv.go, line 140 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Should this be a cannot write to computed column error? Are there plans for addressing the TODO here? I think that it's fairly important for that to be addressed before merging.
It should really be error that checks that IMPORT CSV with targeted columns unspecified (in particular, happens in the case of IMPORT TABLE
) should not support targeted columns. The TODO is there simply because we want to check this in the beginning of an IMPORT instead of every row (though I'm not too sure how would that work)
pkg/sql/row/row_converter.go, line 125 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Why do we need this? Shouldn't we not be marking non-computed columns as computed?
So how it works is (see my comment below), if a row has no computed column, then this computedCols
(or rather, as I renamed it, computedColsLookup
) will be nil
. Otherwise it will contain all the columns.
pkg/sql/row/row_converter.go, line 389 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
How does the ordering of
colsForCompute
differ fromc.tableDesc.Columns
. This probably warrants a comment since it isn't immediately clear.
I'm also not sure thatcolsForCompute
is the right name here since this looks like it's just a different ordering of all of the table columns?
MakeComputedExprs
will return the computedExprs correspond to the order of colsOrdered
(formerly colsForCompute
), assuming that at least one of the columns is computed. Therefore we want the colsOrdered
to follow the order that corresponds to Datum
.
As for the second part, would colsOrdered
work?
pkg/sql/row/row_converter.go, line 407 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Is this what we want? I would expect
computedCols
to just be the columns that are computed.
I'm thinking of passing colsOrdered
into GenerateInsertRow
which would ease the process of finding the corresponding Datum to be filled with computed expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/importccl/read_import_csv.go, line 140 at r3 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
It should really be error that checks that IMPORT CSV with targeted columns unspecified (in particular, happens in the case of
IMPORT TABLE
) should not support targeted columns. The TODO is there simply because we want to check this in the beginning of an IMPORT instead of every row (though I'm not too sure how would that work)
So I think this check could be moved as early to import planning stage: if the targeted columns are not specified, then computed expressions shouldn't be supported (with the exception of, perhaps, AVRO
as the targeted columns are specified in the schemas).
pkg/sql/row/row_converter.go, line 389 at r3 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
MakeComputedExprs
will return the computedExprs correspond to the order ofcolsOrdered
(formerlycolsForCompute
), assuming that at least one of the columns is computed. Therefore we want thecolsOrdered
to follow the order that corresponds toDatum
.As for the second part, would
colsOrdered
work?
Edit: I've moved that to row converter stage so that these computed expressions only have to be evaluated once for every import, not for every row.
pkg/sql/row/row_converter.go, line 383 at r5 (raw file):
return nil, errors.Wrapf(err, "error evaluating computed expression for IMPORT INTO") }
The computed expression is now generated here. Notice that we still want the property of having the computed expressions to in the order aligning with the datums.
5563612
to
3f62b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3f62b7f
to
10b56fe
Compare
Previously, computed columns are not supported for IMPORT. This PR tries to address this problem by adding support for computed columns for IMPORT, which encompasses the following scope: 1. For IMPORT INTO, both CSV and AVRO are supported. 2. For IMPORT TABLE, only AVRO is supported given that a typical CSV data does not have its column mapping specified (that is, does the first column correspond to "a" or "b" in the table)? 3. For IMPORT from dump file, only PGDUMP is supported at this point, as the MYSQLDUMP does not seem to support SQL commands containing AS STORED. The main gist of the code is to process the computed columns at the row converter stage, and then evaluate those computed expressions for each rows once other non-computed columns are evaluated. Release note (general change): computed columns are now supported in the IMPORT INTO operation.
10b56fe
to
2156b0d
Compare
TFTR! bors r=pbardea |
Build succeeded: |
Previously, computed columns are not supported for IMPORT. This PR tries to address this problem by adding support for computed columns for
IMPORT
, which encompasses the following scope:IMPORT INTO
, both CSV and AVRO are supported.IMPORT TABLE
, only AVRO is supported given that a typical CSV data does not have its column mapping specified (that is, does the first column correspond to "a" or "b" in the table)?IMPORT
from dump file, only PGDUMP is supported at this point, as the MYSQLDUMP does not seem to support SQL commands containingAS STORED
.The main gist of the code is to process the computed columns at the row converter stage, and then evaluate those computed expressions for each rows once other non-computed columns are evaluated.
Release note (general change): computed columns are now supported in
IMPORT
for a subset of the file formats.