-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SQL MERGE in the Trino engine and five connectors #7933
Support SQL MERGE in the Trino engine and five connectors #7933
Conversation
b87802f
to
d65286e
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/PagePair.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/PagePair.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kudu commit looks good
core/trino-spi/src/main/java/io/trino/spi/connector/RowChangeParadigm.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/RowChangeParadigm.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/MergeDetails.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduPageSink.java
Outdated
Show resolved
Hide resolved
d65286e
to
f88718f
Compare
Thanks for the great comments, @electrum. I did everything you suggested. |
3108a8d
to
db83bfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of questions and some comments. I've gone through the docs, and partially through the analysis.
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments regarding the analyzer. Initial comments on the planner part.
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
1b878ef
to
238eb2d
Compare
Thanks for the great first batch of comments, @kasiafi! I believe I've addressed the comments from yesterday except those listed below. It would be great if you could resolve the comments you think have been handled to your satisfaction. I haven't addressed the more profound comments made 4 hours ago yet, and some of them will require coaching from you or @martint. Here are the comments from yesterday that I haven't addressed:
|
6038c7f
to
b373e2b
Compare
@djsstarburst can you please point me to a document outlining how MERGE interacts with connectors? i would like to learn about the following
|
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments regarding the previously reviewed part. Additionally, I answered some of your replies directly. I resolved all conversations except those that require a follow-up.
I plan to review next portions of code, and put my comments in a new batch.
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java
Outdated
Show resolved
Hide resolved
if (underlyingBlock instanceof RowBlock) { | ||
List<Block> newRowIdChildrenBuilder = new ArrayList<>(); | ||
rowIdBlock.getChildren().stream() | ||
.map(block -> block.getPositions(rowIdPositions, 0, totalPositions)) | ||
.forEach(newRowIdChildrenBuilder::add); | ||
return RowBlock.fromFieldBlocks( | ||
totalPositions, | ||
Optional.empty(), | ||
newRowIdChildrenBuilder.toArray(new Block[] {})); | ||
} | ||
else { | ||
return rowIdBlock.getPositions(rowIdPositions, 0, totalPositions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why RowBlock
is special-cased here?
What if underlyingBlock
is a DictionaryBlock over a RowBlock? Would it require special-casing as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had endless trouble with this, and it's one of the main things I hoped review would shed light on.
I had hoped that I could just call rowIdBlock.getPositions(...)
and end up with a consistent view of the resulting block. However, when I tried that, way downstream in the Driver
I would see out-of-range array references. My assumption is that I'm doing something wrong, but I wasn't successful debugging the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had endless trouble with this, and it's one of the main things I hoped review would shed light on.
Sorry that i cannot help. Add a TODO comment here, warning the reader we don't exactly know why it's written the way it's written
Arrays.fill(nulls, true); | ||
if (underlyingBlock instanceof RowBlock) { | ||
return RowBlock.fromFieldBlocks(positionCount, Optional.of(nulls), rowIdBlock.getChildren().toArray(new Block[]{})); | ||
} | ||
else { | ||
return ArrayBlock.fromElementBlock(positionCount, Optional.of(nulls), new int[positionCount], underlyingBlock); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this actually depend on rowIdType
?
also, direct use of ArrayBlock
is not correct. Typically you would use io.trino.spi.type.Type#createBlockBuilder(io.trino.spi.block.BlockBuilderStatus, int)
to construct a block of values for given type.
Here, however, you actually want to create a single-value NULL block (nativeValueToBlock
may be helpful) and wrap it in a RunLengthEncodedBlock
instead
core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and questions regarding the planner part. I still have a few classes to review.
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PruneMergeSourceColumns.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PruneMergeSourceColumns.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/UnaliasSymbolReferences.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/SymbolMapper.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/SymbolMapper.java
Outdated
Show resolved
Hide resolved
f4a18f7
to
083ab11
Compare
fb91326
to
541f751
Compare
b6beecb
to
16f4b38
Compare
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorNodePartitioningProvider.java
Show resolved
Hide resolved
...ino-blackhole/src/main/java/io/trino/plugin/blackhole/BlackHoleNodePartitioningProvider.java
Show resolved
Hide resolved
361e835
to
3a2089a
Compare
This version works under emulation on M1 Macs.
This allows the engine to make the decision about how many nodes to use as appropriate, based on the number of workers or hash partition count session property. This is also required for MERGE so that the insert and update layouts can use the same mapping.
This commit adds support for SQL MERGE in the Trino engine. It introduces an enum RowChangeParadigm, which characterizes how a connector modifies rows. Hive and Iceberg will use the DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an updated row as a deleted row and an inserted row. Kudu will use the CHANGE_ONLY_UPDATED_COLUMNS paradigm. Each paradigm corresponds to an implementation of the RowChangeProcessor interface. The intent is to retrofit SQL UPDATE to use the same RowChangeParadigm/Processor mechanism. The SQL MERGE implementation allows update of all columns, including partition or bucket columns, and the Trino engine performs redistribution to ensure that the updated rows end up on the appropriate nodes. MERGE processing is extensively documented in the new file in the developer documentation, supporting-merge.rst.
This commit adds SQL MERGE support in the Hive connector and a raft of MERGE tests to verify that it works.
3a2089a
to
1d2fabd
Compare
1d2fabd
to
53a4500
Compare
This PR is a second take on implementing SQL MERGE. It consists commits that add support for SQL MERGE in the Trino engine and in the Hive, Kudu, Raptor, Iceberg and Delta Lake connectors. The implementation is structured so that most of the work happens in the Trino engine, so adding support in a connector is pretty simple.
The SQL MERGE implementation allows update of all columns, including partition or bucket columns, and the Trino engine performs redistribution to ensure that the updated rows end up on the appropriate nodes.
The Trino engine commit introduces an enum RowChangeParadigm, which characterizes how a connector modifies rows. Hive uses and Iceberg will use the DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an updated row as a deleted row and an inserted row. Kudu uses the CHANGE_ONLY_UPDATED_COLUMNS paradigm.
Each paradigm corresponds to an implementation of the RowChangeProcessor interface. After this PR is merged, the intent is to retrofit SQL UPDATE to use the same RowChangeParadigm/Processor mechanism.
Extensive documentation on the internal MERGE architecture can be found in the developer doc supporting-merge.rst.
Fixes #7708