-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: fix target column ordering bug for PGDUMP import #51065
Conversation
conv.IsTargetCol = make(map[int]struct{}) | ||
colLookup := make(map[string]int, 0) | ||
for j, col := range conv.VisibleCols { | ||
colLookup[col.Name] = j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be created (at most) once per insert, not once per insert tuple. Once per file would be better, but that may be tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to an outer loop so that it will be created per insert. Don't think it would work for once per file because in the case of IMPORT PGDUMP blah...
the targeted columns could be different for each insert statement within the file.
} else { | ||
colName = i.Columns[j].String() | ||
} | ||
ind, ok := colLookup[colName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unfortunate that now the no column case for inserts is forced to go through a string map lookup. The if block above's purpose is to set colName so that the lookup can be used, but we really only need ind to be set correctly. So this should also be fine?:
if len(i.Columns) == 0 {
ind = j
} else {
colName := i.Columns[j].String();
ind, ok = colLookup[colName];
if !ok { ... }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, I did the change as you suggested.
@@ -578,11 +584,27 @@ func (m *pgDumpReader) readFile( | |||
if count <= resumePos { | |||
continue | |||
} | |||
if expected, got := len(conv.VisibleCols), len(tuple); expected != got { | |||
if expected, got := expectedColLen, len(tuple); expected != got { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to create expected
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (thanks for catching that!)
@@ -579,19 +583,19 @@ func (m *pgDumpReader) readFile( | |||
} | |||
inserts++ | |||
startingCount := count | |||
colLookup := make(map[string]int, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0
-> len(conv.VisibleCols)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4483400
to
ed3370f
Compare
} | ||
for i, expr := range tuple { | ||
typed, err := expr.TypeCheck(ctx, &semaCtx, conv.VisibleColTypes[i]) | ||
conv.IsTargetCol = make(map[int]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to stop doing multiple map creations per INSERT row. This has now regressed and will recreate this IsTargetCol map per INSERT row, whether or not columns are specified, which is worse than what we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the conv.IsTargetCol
creation (more like an update) to an outer loop so that it will be done for every INSERT
statement instead of each row of the INSERT
statement. Unfortunately, this is the best possible optimization in terms of the frequency of the map creation as the target columns (and possibly the order) of each insert statement might be different.
I've also made targetColMapInd
slice for each INSERT
statement so that each tuple of each row can know which data each tuple corresponds to.
if len(i.Columns) == 0 { | ||
ind = j | ||
} else { | ||
colName := i.Columns[j].String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so what we actually need is just the type of the column, and to produce an error if a name is used that doesn't exist. So maybe something like:
if len(i.Columns) == 0 {
typ = conv.VisibleColTypes[j]
} else {
colName := i.Columns[j].String()
typ, ok = conv.SomeNameLookup[colName]
if !ok { ... }
}
And then set SomeNameLookup
(renamed clearly) one time during init. It is possible some variable like this already exists, I haven't looked at what conv
is. Does that work? If so it would remove creating 2 maps per insert row, which is really bad.
8fb6b1d
to
65e8ded
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's possible to only create a map one time. What we need is a way to map the column index of an insert VALUES row to 1) a type and 2) the correct index in conv.Datums. We already know how to do that when insert.Columns isn't specified by using conv.VisibleColTypes. We need to be able to do that for when insert.Columns is specified. Ok, so when creating the pgDumpReader, for each table in pgDumpReader.tables, create a map that maps column names to a struct with that column's type and its index needed for conv.Datums. This way if insert.Columns is populated, there's an existing lookup map that we can use. Since we're only creating the map one time, we don't care about the tiny perf hit for creating it when it's not used, and it saves us from recreating a map for every row, which is slow since it has to be made on the heap and get GC'd. (If we do have to do this, the map should be part of pgDumpReader and it gets cleared instead of recreated each time, but I don't think we need to do this.)
The current implementation assumes that the target columns of a PGDUMP query is the same as how they are created in the case where target columns are declared in PGDUMP file. This PR addresses it by detecting the target columns in the PGDUMP statement itself if this is the case. In addition, given that the target columns may not be well-determined at the formation of a new `DatumRowConverter`, so the check of unsupported default column expression is also moved to the DatumRowConverter.Row() function. Release note: None
I've updated the change so that the map is created at the Also, added tests to show that it will work in the case of target columns reordering and having supported default columns, both for |
Unfortunate we still have to create a map, but this looks like it's in a good place. Thanks for the work. |
TFTR! |
Build succeeded |
The current implementation assumes that the target columns of a PGDUMP query is the same as how they are created in the case where target columns are declared in PGDUMP file. This PR addresses it by detecting the target columns in the PGDUMP statement itself if this is the case. In addition, given that the target columns may not be well-determined at the formation of a new
DatumRowConverter
, so the check of unsupported default column expression is also moved to theDatumRowConverter.Row()
function.Fixed #51159
Release note: None