-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do case insensitive comparison between dereferenced fields and internal ORC field names #7350
Do case insensitive comparison between dereferenced fields and internal ORC field names #7350
Conversation
We should definitely have a test for this. |
FYI - this product test will fail without the code change introduced here. |
I added a label to run all Hive product tests and also made small cleanup in a test and added an additional assertion. |
build was almost green (expcet main and tests which seemed unrelated). i did restart all of it to be on the safe side |
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willmostly thanks for making the change, lgtm. Also, could you please squash your commits.
toList()))); | ||
} | ||
else { | ||
projectionsByColumnIndex = projections.stream() | ||
.collect(Collectors.groupingBy( | ||
HiveColumnHandle::getBaseHiveColumnIndex, | ||
mapping( | ||
column -> column.getHiveColumnProjectionInfo().map(HiveColumnProjectionInfo::getDereferenceNames).orElse(ImmutableList.<String>of()), | ||
column -> column.getHiveColumnProjectionInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is now getting a bit longer at two places, we could extract it out to simplify. (I'd leave it up to you if you'd like to tackle it separately.)
projectionsByColumnName = projections.stream()
.collect(Collectors.groupingBy(
HiveColumnHandle::getBaseColumnName,
mapping(OrcPageSourceFactory::getDereferencesAsList, toList())));
private static List<String> getDereferencesAsList(HiveColumnHandle column)
{
return column.getHiveColumnProjectionInfo()
.map(info -> info.getDereferenceNames().stream()
.map(dereference -> dereference.toLowerCase(ENGLISH))
.collect(toList()))
.orElse(ImmutableList.<String>of());
}
assertThat(query("SELECT c_struct.testCustId FROM " + tableName)).containsOnly(row("1234")); | ||
assertThat(query("SELECT c_struct.testcustid FROM " + tableName)).containsOnly(row("1234")); | ||
assertThat(query("SELECT c_struct.requestDate FROM " + tableName)).containsOnly(row("some day")); | ||
setProjectionPushdownEnabled(onTrino().getConnection(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onTrino.getConnection()
and query(
use different connections, so seems like disabling the pushdown doesn't get applied really. Can you use onTrino().executeQuery
? (Or something similar to AbstractTestHiveViews#setSessionProperty
)
…al ORC field names
if failed with permission denied when setting session properties in suite-2,3
CI hit #7535 |
Merged as 8bac015, thanks! |
Even though this has been fixed. @willmostly @findepi I found out that. I was getting Data{"data":{"visits":"1","hits":"8","pageviews":"4","timeonsite":"107","sessionqualitydim":"99"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","timeonsite":"250","sessionqualitydim":"56"}}
{"data":{"visits":"1","hits":"10","pageviews":"5","timeonsite":"1439","sessionqualitydim":"23"}}
{"data":{"visits":"1","hits":"10","pageviews":"5","sessionqualitydim":"45"}}
{"data":{"visits":"1","hits":"10","pageviews":"5","sessionqualitydim":"45"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","sessionqualitydim":"56"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","timeonsite":"250"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","timeonsite":"250"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","timeonsite":"250"}}
{"data":{"visits":"1","hits":"19","pageviews":"10","timeonsite":"250"}} Terminal Output
|
@aakashnand please consider creating a new issue to avoid this being forgotten about |
@findepi my above comment was just a note and the problem is solved after this PR. Should I still create a new issue and say that it was fixed by this PR? I just wanted to highlight that the problem is not only for camel cases but also for a small cases. |
Trino will currently return NULLs if table metadata contains a struct that uses caps, e.g.
entryId struct<custId:string,acctGuid:string,requestDate:string>
and the ORC internal field definition does not. Reading the code, I actually suspect that NULL would be returned even if the internal field name did match the casing in the table def'n, but I have not tested this scenario.