Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-hive] Enable parsing spark created views without quoted keywords #503

Merged
merged 2 commits into from
Apr 23, 2024

Conversation

nagarathnam200
Copy link
Contributor

@nagarathnam200 nagarathnam200 commented Apr 17, 2024

Summary

[Coral-hive] Enable parsing spark created views without quoted keywords

Description

There are a subset of reserved words like timestamp which are keywords in hive but not in spark. When views are created using spark engine and these reserved words are used as column name alias, the expanded view text does not have them backquoted. This results in translation errors like

FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10351)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.identifier(HiveParser.java:40450)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2451)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1148)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:877)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.selectClause(HiveParser.java:40433)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.selectStatement(HiveParser.java:36241)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.regularBody(HiveParser.java:36148)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.queryStatementExpressionBody(HiveParser.java:35180)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.queryStatementExpression(HiveParser.java:35057)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.execStatement(HiveParser.java:1529)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.HiveParser.statement(HiveParser.java:1089)
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.CoralParseDriver.parse(CoralParseDriver.java:35)
        at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:108)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toSqlNode(HiveToRelConverter.java:100)
        at com.linkedin.coral.common.ToRelConverter.processView(ToRelConverter.java:152)
        at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:123)
        at com.linkedin.coral.tools.SparkValidator.getCoralSparkTranslation(SparkValidator.java:59)
        at com.linkedin.coral.tools.SparkValidator.convertAndValidate(SparkValidator.java:39)
        at com.linkedin.coral.tools.SingleViewTranslation.translateTable(SingleViewTranslation.java:63)
        at com.linkedin.coral.tools.SingleViewTranslation.main(SingleViewTranslation.java:55)
Exception in thread "main" java.lang.RuntimeException: com.linkedin.coral.hive.hive2rel.parsetree.parser.ParseException: line 1:25 Failed to recognize predicate 'timestamp'. Failed rule: 'identifier' in selection target
        at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:111)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toSqlNode(HiveToRelConverter.java:100)
        at com.linkedin.coral.common.ToRelConverter.processView(ToRelConverter.java:152)
        at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:123)
        at com.linkedin.coral.tools.SparkValidator.getCoralSparkTranslation(SparkValidator.java:59)
        at com.linkedin.coral.tools.SparkValidator.convertAndValidate(SparkValidator.java:39)
        at com.linkedin.coral.tools.SingleViewTranslation.translateTable(SingleViewTranslation.java:63)
        at com.linkedin.coral.tools.SingleViewTranslation.main(SingleViewTranslation.java:55)
Caused by: com.linkedin.coral.hive.hive2rel.parsetree.parser.ParseException: line 1:25 Failed to recognize predicate 'timestamp'. Failed rule: 'identifier' in selection target
        at com.linkedin.coral.hive.hive2rel.parsetree.parser.CoralParseDriver.parse(CoralParseDriver.java:38)
        at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:108)

This patch attempts to fix this error by checking the tableproperties of the view to get hints on if the view is created using spark and sets hive.support.sql11.reserved.keywords appropriately so that translation of these views will succeed.

Note that this patch intentionally uses the config string hive.support.sql11.reserved.keywords instead of the variable HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS. This is because the HiveParser.g here is copied from Hive 1.2.4 and assumes the presence of this conf but this conf has been removed in later releases triggering
java.lang.NoSuchFieldError: HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS when hive-common from latest hive versions is found in the classpath during translation.

Testing Done

Added unit tests and validates that the unit test fails without fix and passes with fix.
Validates that all the existing unit test passes.
Validated a custom build of this coral fix in spark-shell against a view with keyword timestamp and was able to get successful translation.
Tested on current production views and did not find any regression.

@@ -29,6 +42,17 @@ public ASTNode parse(String command) throws ParseException {
HiveLexerCoral lexer = new HiveLexerCoral(new ANTLRNoCaseStringStream(command));
TokenRewriteStream tokens = new TokenRewriteStream(lexer);
HiveParser parser = new HiveParser(tokens);
HiveConf hiveConf = new HiveConf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pass the hiveConf from HiveToRelConverter to CoralParseDriver? and then the following if-else logic can be performed in the CoralParseDriver's constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what advantage we would gather there. If we want to expose the hiveConf from user, we would have to extend HiveMetastoreClient.java to have an additional function to getConf and pass in the conf used in the creation of HiveMetastoreClient all the way down to CoralParseDriver and I am not sure if that fix is relevant to the issue being fixed in this patch.

Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ljfgem ljfgem merged commit a83b648 into linkedin:master Apr 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants