New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

importccl: fix target column ordering bug for PGDUMP import #51065

Merged

craig merged 1 commit into cockroachdb:master from Anzoteh96:import-pgdump

Jul 9, 2020

Anzoteh96 commented Jul 7, 2020 •

edited

Loading

The current implementation assumes that the target columns of a PGDUMP query is the same as how they are created in the case where target columns are declared in PGDUMP file. This PR addresses it by detecting the target columns in the PGDUMP statement itself if this is the case. In addition, given that the target columns may not be well-determined at the formation of a new DatumRowConverter, so the check of unsupported default column expression is also moved to the DatumRowConverter.Row() function.

Fixed #51159

Release note: None

Member

cockroach-teamcity commented Jul 7, 2020

This change is

Anzoteh96 mentioned this pull request

importccl: support unique_rowid() as default expression for IMPORT INTO #50922

Merged

maddyblue reviewed

View reviewed changes

pkg/ccl/importccl/read_import_pgdump.go Outdated

+              				conv.IsTargetCol = make(map[int]struct{})
+              				colLookup := make(map[string]int, 0)
+              				for j, col := range conv.VisibleCols {
+              					colLookup[col.Name] = j

Contributor

maddyblue Jul 7, 2020

This should be created (at most) once per insert, not once per insert tuple. Once per file would be better, but that may be tricky.

Author

Anzoteh96 Jul 7, 2020

I moved this to an outer loop so that it will be created per insert. Don't think it would work for once per file because in the case of IMPORT PGDUMP blah... the targeted columns could be different for each insert statement within the file.

pkg/ccl/importccl/read_import_pgdump.go Outdated

+              					} else {
+              						colName = i.Columns[j].String()
+              					}
+              					ind, ok := colLookup[colName]

Contributor

maddyblue Jul 7, 2020

It is unfortunate that now the no column case for inserts is forced to go through a string map lookup. The if block above's purpose is to set colName so that the lookup can be used, but we really only need ind to be set correctly. So this should also be fine?:

if len(i.Columns) == 0 {
  ind = j
} else {
  colName := i.Columns[j].String();
  ind, ok = colLookup[colName];
  if !ok { ... }
}

Author

Anzoteh96 Jul 7, 2020

Yes you're right, I did the change as you suggested.

pkg/ccl/importccl/read_import_pgdump.go Outdated

@@ @@ -578,11 +584,27 @@ func (m *pgDumpReader) readFile( @@
               				if count <= resumePos {
               					continue
               				}
-              				if expected, got := len(conv.VisibleCols), len(tuple); expected != got {
+              				if expected, got := expectedColLen, len(tuple); expected != got {

Contributor

maddyblue Jul 7, 2020

no need to create expected here

Author

Anzoteh96 Jul 7, 2020

Done (thanks for catching that!)

maddyblue reviewed

View reviewed changes

pkg/ccl/importccl/read_import_pgdump.go Outdated

@@ @@ -579,19 +583,19 @@ func (m *pgDumpReader) readFile( @@
               			}
               			inserts++
               			startingCount := count
+              			colLookup := make(map[string]int, 0)

Contributor

maddyblue Jul 7, 2020

0 -> len(conv.VisibleCols)

Author

Anzoteh96 Jul 7, 2020

Done.

Anzoteh96 force-pushed the import-pgdump branch 2 times, most recently from 4483400 to ed3370f Compare

July 7, 2020 18:18

Anzoteh96 marked this pull request as ready for review

July 7, 2020 18:19

Anzoteh96 requested review from a team and pbardea and removed request for a team

July 7, 2020 18:19

Anzoteh96 changed the title ~~importccl: [WIP] fix bug for PGDUMP import~~ importccl: fix bug for PGDUMP import

Anzoteh96 changed the title ~~importccl: fix bug for PGDUMP import~~ importccl: fix target column ordering bug for PGDUMP import

maddyblue reviewed

View reviewed changes

pkg/ccl/importccl/read_import_pgdump.go Outdated

               				}
-              				for i, expr := range tuple {
-              					typed, err := expr.TypeCheck(ctx, &semaCtx, conv.VisibleColTypes[i])
+              				conv.IsTargetCol = make(map[int]struct{})

Contributor

maddyblue Jul 7, 2020

We need a way to stop doing multiple map creations per INSERT row. This has now regressed and will recreate this IsTargetCol map per INSERT row, whether or not columns are specified, which is worse than what we had before.

Author

Anzoteh96 Jul 8, 2020

I've moved the conv.IsTargetCol creation (more like an update) to an outer loop so that it will be done for every INSERT statement instead of each row of the INSERT statement. Unfortunately, this is the best possible optimization in terms of the frequency of the map creation as the target columns (and possibly the order) of each insert statement might be different.

I've also made targetColMapInd slice for each INSERT statement so that each tuple of each row can know which data each tuple corresponds to.

pkg/ccl/importccl/read_import_pgdump.go Outdated

+              					if len(i.Columns) == 0 {
+              						ind = j
+              					} else {
+              						colName := i.Columns[j].String()

Contributor

maddyblue Jul 7, 2020

Ok so what we actually need is just the type of the column, and to produce an error if a name is used that doesn't exist. So maybe something like:

if len(i.Columns) == 0 {
  typ = conv.VisibleColTypes[j]
} else {
  colName := i.Columns[j].String()
  typ, ok = conv.SomeNameLookup[colName]
  if !ok { ... }
}

And then set SomeNameLookup (renamed clearly) one time during init. It is possible some variable like this already exists, I haven't looked at what conv is. Does that work? If so it would remove creating 2 maps per insert row, which is really bad.

Anzoteh96 force-pushed the import-pgdump branch 3 times, most recently from 8fb6b1d to 65e8ded Compare

July 8, 2020 14:24

maddyblue reviewed

View reviewed changes

Contributor

maddyblue left a comment

I still think it's possible to only create a map one time. What we need is a way to map the column index of an insert VALUES row to 1) a type and 2) the correct index in conv.Datums. We already know how to do that when insert.Columns isn't specified by using conv.VisibleColTypes. We need to be able to do that for when insert.Columns is specified. Ok, so when creating the pgDumpReader, for each table in pgDumpReader.tables, create a map that maps column names to a struct with that column's type and its index needed for conv.Datums. This way if insert.Columns is populated, there's an existing lookup map that we can use. Since we're only creating the map one time, we don't care about the tiny perf hit for creating it when it's not used, and it saves us from recreating a map for every row, which is slow since it has to be made on the heap and get GC'd. (If we do have to do this, the map should be part of pgDumpReader and it gets cleared instead of recreated each time, but I don't think we need to do this.)


          importccl: fix target column ordering bug for PGDUMP import

1fc7588

The current implementation assumes that the target columns of a PGDUMP query
is the same as how they are created in the case where target columns are
declared in PGDUMP file. This PR addresses it by detecting the target columns
in the PGDUMP statement itself if this is the case. In addition, given
that the target columns may not be well-determined at the formation of a
new `DatumRowConverter`, so the check of unsupported default column
expression is also moved to the DatumRowConverter.Row() function.

Release note: None

Anzoteh96 force-pushed the import-pgdump branch from 65e8ded to 1fc7588 Compare

July 9, 2020 15:31

Author

Anzoteh96 commented Jul 9, 2020

I still think it's possible to only create a map one time. What we need is a way to map the column index of an insert VALUES row to 1) a type and 2) the correct index in conv.Datums. We already know how to do that when insert.Columns isn't specified by using conv.VisibleColTypes. We need to be able to do that for when insert.Columns is specified. Ok, so when creating the pgDumpReader, for each table in pgDumpReader.tables, create a map that maps column names to a struct with that column's type and its index needed for conv.Datums. This way if insert.Columns is populated, there's an existing lookup map that we can use. Since we're only creating the map one time, we don't care about the tiny perf hit for creating it when it's not used, and it saves us from recreating a map for every row, which is slow since it has to be made on the heap and get GC'd. (If we do have to do this, the map should be part of pgDumpReader and it gets cleared instead of recreated each time, but I don't think we need to do this.)

I've updated the change so that the map is created at the newPgDumpReader stage and each insert statement simply reads the map to create the targetColMapInd slice for the use of each tuple statement. Now the conv.IsTargetCol still needs to be updated for each INSERT statement for the following reason: as mentioned, the targeted columns might differ for each statement, and this IsTargetCol is needed at DatumRowConverter.Row function (at least for now) because that's where the default expressions are being evaluated for non-targeted columns (and possibly an error being returned in case default expression is not supported).

Also, added tests to show that it will work in the case of target columns reordering and having supported default columns, both for IMPORT INTO ... PGDUMP FILE and IMPORT PGDUMP.

maddyblue approved these changes

View reviewed changes

Contributor

maddyblue commented Jul 9, 2020

Unfortunate we still have to create a map, but this looks like it's in a good place. Thanks for the work.

Author

Anzoteh96 commented Jul 9, 2020

TFTR!
bors r=mjibson

Contributor

craig bot commented Jul 9, 2020

Build succeeded

GitHub CI (Cockroach)

craig bot merged commit ba2deac into cockroachdb:master

Anzoteh96 deleted the import-pgdump branch

July 10, 2020 17:45

Anzoteh96 mentioned this pull request

importccl: add support for importing computed columns #51321

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet