-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-join pushdown appends recursively _<number> to column names so that they exceed maximum alias length #18924
Conversation
This reverts commit 8fce268.
public static final int DEFAULT_COLUMN_ALIAS_LENGTH = 30; | ||
public static final int ORIGINAL_COLUMN_NAME_LENGTH = 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no change requested yet:
notice that the code lives in trino-base-jdbc but the constants are for Oracle database limitations.
This is why it was requested to accept the max length limit as an argument to the method (easiest to test and bind in Guice) or the class itself (still ok to test, harder to bind in Guice).
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/ColumnWithAliasFormatter.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/ColumnWithAliasFormatter.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/ColumnWithAliasFormatter.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/ColumnWithAliasFormatter.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestColumnWithAliasFormatter.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOracleConnectorTest.java
Outdated
Show resolved
Hide resolved
@Test | ||
public void testPushdownJoinWithLongNameSucceeds() | ||
{ | ||
tryCleanupTemporaryTable(); | ||
try { | ||
String baseColumnName = "test_pushdown_" + randomNameSuffix(); | ||
String validColumnName = baseColumnName + "z".repeat(MAX_CHARS_COLUMN_ALIAS - baseColumnName.length()); | ||
|
||
assertUpdate(format(""" | ||
CREATE TABLE orders_1 as | ||
SELECT orderkey as %s, | ||
custkey, | ||
orderstatus, | ||
totalprice, | ||
orderdate, | ||
orderpriority, | ||
clerk, | ||
shippriority, | ||
comment | ||
FROM orders | ||
""", validColumnName), "VALUES 15000"); | ||
|
||
Session session = Session.builder(getSession()) | ||
.setCatalogSessionProperty("oracle", JOIN_PUSHDOWN_ENABLED, "true") | ||
.build(); | ||
assertQuery(session, | ||
format(""" | ||
SELECT c.custkey, o.%s, n.nationkey | ||
FROM orders_1 o JOIN customer c ON c.custkey = o.custkey | ||
JOIN nation n ON c.nationkey = n.nationkey | ||
""", validColumnName), | ||
""" | ||
SELECT c.custkey, o.orderkey, n.nationkey | ||
FROM orders o JOIN customer c ON c.custkey = o.custkey | ||
JOIN nation n ON c.nationkey = n.nationkey | ||
"""); | ||
} | ||
finally { | ||
tryCleanupTemporaryTable(); | ||
} | ||
} | ||
|
||
private void tryCleanupTemporaryTable() | ||
{ | ||
try { | ||
assertUpdate("DROP TABLE orders_1"); | ||
} | ||
catch (Exception ex) { | ||
ex.printStackTrace(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Test | |
public void testPushdownJoinWithLongNameSucceeds() | |
{ | |
tryCleanupTemporaryTable(); | |
try { | |
String baseColumnName = "test_pushdown_" + randomNameSuffix(); | |
String validColumnName = baseColumnName + "z".repeat(MAX_CHARS_COLUMN_ALIAS - baseColumnName.length()); | |
assertUpdate(format(""" | |
CREATE TABLE orders_1 as | |
SELECT orderkey as %s, | |
custkey, | |
orderstatus, | |
totalprice, | |
orderdate, | |
orderpriority, | |
clerk, | |
shippriority, | |
comment | |
FROM orders | |
""", validColumnName), "VALUES 15000"); | |
Session session = Session.builder(getSession()) | |
.setCatalogSessionProperty("oracle", JOIN_PUSHDOWN_ENABLED, "true") | |
.build(); | |
assertQuery(session, | |
format(""" | |
SELECT c.custkey, o.%s, n.nationkey | |
FROM orders_1 o JOIN customer c ON c.custkey = o.custkey | |
JOIN nation n ON c.nationkey = n.nationkey | |
""", validColumnName), | |
""" | |
SELECT c.custkey, o.orderkey, n.nationkey | |
FROM orders o JOIN customer c ON c.custkey = o.custkey | |
JOIN nation n ON c.nationkey = n.nationkey | |
"""); | |
} | |
finally { | |
tryCleanupTemporaryTable(); | |
} | |
} | |
private void tryCleanupTemporaryTable() | |
{ | |
try { | |
assertUpdate("DROP TABLE orders_1"); | |
} | |
catch (Exception ex) { | |
ex.printStackTrace(); | |
} | |
} | |
@Test | |
public void testPushdownJoinWithLongNameSucceeds() | |
{ | |
String maximumLengthColumnIdentifier = "z".repeat(MAX_CHARS_COLUMN_ALIAS); | |
try (TestTable table = new TestTable(getQueryRunner()::execute, "long_identifier", "(%s bigint)".formatted(maximumLengthColumnIdentifier))) { | |
assertThat(query(joinPushdownEnabled(getSession()), """ | |
SELECT r.name, t.%s, n.name | |
FROM %s t JOIN region r ON r.regionkey = t.%s | |
JOIN nation n ON r.regionkey = n.regionkey""".formatted(maximumLengthColumnIdentifier, table.getName(), maximumLengthColumnIdentifier))) | |
.isFullyPushedDown(); | |
} | |
} | |
This seems to be sufficient to test.
Try to avoid creating data in tests since it can be slow, specially if using larger tables like orders. nation and region are best if your test doesn't care about size of tables.
Note: we need to verify that the query was indeed pushed down because otherwise it's possible that the connector would generate a plain SELECT with no synthetic ids and hence our test will pass.
Joins cannot always be pushed down even if join pushdown is enabled (e.g. if the join condition or some other operation in the query prevents pushdown).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also while this test by itself is sufficient to prove that the aliasing logic works it doesn't prove that we chose an optimal value for the max identifier length.
IMO we should have another test to verify the length we chose:
- Try create a column with length = max_length - must succeed
- Try create a column with length = max_length + 1 - must fail in expected manner
the reason why i believe this is useful is because for example we can use 30 as the max globally but it un-necessarily loses information in connectors which can actually support longer identifiers making reading explain plans harder than needed. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That's a really neat API. Swapped the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use 30 as the max globally but it un-necessarily loses information in connectors which can actually support longer identifiers making reading explain plans harder than needed
If that is the case, we can try to leave the old 'formatter', in case it's a non-Oracle implementation. Then the code will run exactly as was (possibly with a bug) for any jdbc that is non-Oracle. Not sure whether this is better (because we have the old logic that does not truncate, so it is easier to trace/debug), or worse (because there might still be a bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know we'll follow up on this I don't think either choice makes a difference. End result after the follow up will be that each JDBC connector will use some formatter.
@hashhar, I have stepped back one tiny bit and started thinking whether:
I started analyzing the code with that in mind and here are my conclusions (I might be wrong, so please straighten this up if that is the case). Normally, when I deal with test code where DI is used, especially when guice is used, I see guice modules being used via annotating classes with @guice top-level. In those modules there's usually a mixture of production bindings but also a fraction of test-only modules where one binds specific constants for specific tests, or mocks for tests. Now, when browsing through TestOracleConnectorTest and its base class ie. BaseOracleConnectorTest, I noticed that all instantiations are made through 'new' constructor calls. Whenever there's a variation needed per Oracle/Postgres/etc., there's a method in the base class that is being overriden by the subsequent test class. This feels awkward to me and counter intuitive. Normally I'd expect that for instance TestOracleConnectorTest has a @guice annotation with a module specified that would look like:
and the testing class code would then be:
The fact that we use guice DI in production in the usual way through the bindings and modules, makes it harder to swap something out in the tests if we don't use guice in the tests at all. So my conclussion/suggestion is that if we go back to the analysis of [1] and [2], I believe that the problem at hand is that we would be able to agree on the variation of [1], however because we have a technical problem [2], we have divided ourselves into two camps. It's easier to say the test is not needed, because we the don't have to admit [2] :) Now, I understand that rewriting (or even getting an approval for rewriting) even a single test to use DI the way I was used to is probably out of scope/insane idea. At this point I believe the only reasonable thing we can do is to agree on the 'tactical' solution to deal with this problem. I know what I have written does not resolve anything. Just throwing it in, so that I can cross check whether it improves our understanding of the problem. |
3ed886a
to
a08b307
Compare
we'll I don't know how to answer regarding DI in tests, consistency comes first since multiple people work on the project and familiarity is important for open-source contributors. Rewriting the world is possible but there are complications which aren't visible at first sight and cause the rewrite to take longer than expected without bringing too much value. i.e. does using DI in tests solve a problem we have right now? See for example #9378 and #14027. It's been going on for 2 years now with not a lot of immediately visible benefits (even though I agree we should do it). |
Also note that DI includes constructor injection so using Guice or a setup method doesn't make a practical difference. And in this specific case the module won't save you repetition because if you look further you'll notice each test class needs a differently configured query runner so you'll have multiple modules anyway and the modules make the test setup lie farther away from the test code which doesn't seem beneficial. TL; DR: I fail to see what practical problem does DI in tests solve. |
a08b307
to
30ae72b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nitpicks
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
{ | ||
this.jdbcClient = requireNonNull(jdbcClient, "jdbcClient is null"); | ||
this.jdbcQueryEventListeners = ImmutableSet.copyOf(requireNonNull(jdbcQueryEventListeners, "queryEventListeners is null")); | ||
this.builder = requireNonNull(builder, "aliasFormatter is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, match the message name and consider a better name for this property
plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOracleConnectorTest.java
Outdated
Show resolved
Hide resolved
30ae72b
to
d47d13a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - all comments are editorial comments now
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/SyntheticColumnHandleBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestSyntheticColumnHandleBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixConnectorFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMetadata.java
Outdated
Show resolved
Hide resolved
d47d13a
to
75f893a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % some argument renames + one nitpick
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/SyntheticColumnHandleBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestDefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestSyntheticColumnHandleBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteJdbcMetadataFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-ignite/src/main/java/io/trino/plugin/ignite/IgniteMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMetadata.java
Outdated
Show resolved
Hide resolved
75f893a
to
8dcbe36
Compare
8dcbe36
to
2ee0b52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I'll merge after CI is done.
Only comment I have is that the commit history is less than ideal. |
public DefaultJdbcMetadata(JdbcClient jdbcClient, | ||
boolean precalculateStatisticsForPushdown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We format arguments like below when putting arguments on separate lines:
public DefaultJdbcMetadata(
JdbcClient jdbcClient,
boolean precalculateStatisticsForPushdown,
Set<JdbcQueryEventListener> jdbcQueryEventListeners,
SyntheticColumnHandleBuilder syntheticColumnBuilder)
Same for Ignite and Phoenix.
{ | ||
private final SyntheticColumnHandleBuilder syntheticColumnHandleBuilder = new SyntheticColumnHandleBuilder(); | ||
|
||
@DataProvider(name = "columns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper method should be located after the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate or provide an example. I don't think I understand your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean replacing the method order:
@Test(dataProvider = "columns")
public void testColumnAliasTruncation(String columnName, int nextSynthenticId, String expectedSyntheticColumnName)
{
JdbcColumnHandle column = getDefaultColumnHandleBuilder()
.setColumnName(columnName)
.build();
JdbcColumnHandle result = syntheticColumnHandleBuilder.get(column, nextSynthenticId);
assertThat(result.getColumnName()).isEqualTo(expectedSyntheticColumnName);
}
@DataProvider(name = "columns")
public static Object[][] testData()
{
return new Object[][] {
{"column_0", 999, "column_0_999"},
{"column_with_over_twenty_characters", 100, "column_with_over_twenty_ch_100"},
{"column_with_over_twenty_characters", MAX_VALUE, "column_with_over_tw_2147483647"}
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that we are reading from the top to bottom. That way you read more abstract things, then if you care you go deeper and understand details. This follows Clean Code book.
private final SyntheticColumnHandleBuilder syntheticColumnHandleBuilder = new SyntheticColumnHandleBuilder(); | ||
|
||
@DataProvider(name = "columns") | ||
public static Object[][] testData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually use dataProvider
suffix for @DataProvider
. I would rename to columnDataProvider
or something.
.setColumnName("column_0") | ||
.build(); | ||
|
||
assertThatThrownBy(() -> syntheticColumnHandleBuilder.get(column, -2147483648)).isInstanceOf(VerifyException.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would better to verify the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here. Can you explain in more detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThatThrownBy(() -> syntheticColumnHandleBuilder.get(column, -2147483648))
.isInstanceOf(VerifyException.class)
.hasMessage("nextSyntheticColumnId rolled over and is not monotonically increasing any more");
{ | ||
private final SyntheticColumnHandleBuilder syntheticColumnHandleBuilder = new SyntheticColumnHandleBuilder(); | ||
|
||
@DataProvider(name = "columns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that we are reading from the top to bottom. That way you read more abstract things, then if you care you go deeper and understand details. This follows Clean Code book.
@@ -83,4 +88,16 @@ protected SqlExecutor onRemoteDatabase() | |||
{ | |||
return oracleServer::execute; | |||
} | |||
|
|||
@Test | |||
public void testPushdownJoinWithLongNameSucceeds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test with multiple super long column names used in the same Trino query that are almost the same (zzz..zza
, zzz..zzb
, ...)? Such columns should represent different data. The thing I was thinking we could check is to validate that synthetic names do not generate conflicts.
Or is it an overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, see the tests being added in #18984
fortunately the remote databases which silently truncate long identifiers (SQLServer, Postgres, Redshift) all prevent this bug from happening because they realise that the truncated alias is now ambiguous.
Acceptance Criteria
Out of Scope
How to Reproduce
The following enables join pushdown that otherwise would not happen [1], [2].
The following query
should trigger the breakpoint.
In ‘Threads & Variables’ view you should be able to observe the ‘… AS COLUMN_X_Y’ aliases for columns that have as many underscores with a follow up number, as the number of joins in the query.
Proposed solution
Divide the number of characters available for the alias (N) into two:
where N - 1 = P + M. Use the additional character '_' to visually divide 'Original Column Name Prefix' from 'Uniqueness Magic Number'.
Example
Assume N = 8, P = 4, M = 3
Query:
The aliases would then become:
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: