Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Phoenix connector #76

Closed
wants to merge 1 commit into from
Closed

Add Apache Phoenix connector #76

wants to merge 1 commit into from

Conversation

vincentpoon
Copy link
Member

@vincentpoon vincentpoon commented Jan 25, 2019

@martint
Copy link
Member

martint commented Jan 25, 2019

Thanks @vincentpoon. We're going to start looking at this shortly. In the meantime, please make sure to sign the CLA so we can get your code merged.

@vincentpoon
Copy link
Member Author

@martint
Copy link
Member

martint commented Jan 26, 2019

@cla-bot check

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed.

pom.xml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
@vincentpoon
Copy link
Member Author

Thanks for the initial review @kokosing , I've just pushed fixes to most of your suggestions and more cleanup.

@vincentpoon
Copy link
Member Author

@combineads Can you submit the CLA? https://github.com/prestosql/cla
Tried emailing you, but didn't get a response.

@findepi
Copy link
Member

findepi commented Feb 2, 2019

@vincentpoon please squash your commits into single commit. Place yourself as the commit author (git commit --amend --reset-author) and use this as the message:

Add Apache Phoenix connector

Based on @combineads (Byunghwa Yun <[email protected]>)
work published at https://github.com/prestodb/presto/pull/10536.

@cla-bot cla-bot bot added the cla-signed label Feb 4, 2019
@vincentpoon
Copy link
Member Author

@findepi Done, thanks for the suggestion

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

I just skimmed the code.

@trinodb trinodb deleted a comment from cla-bot bot Feb 5, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 5, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 5, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 5, 2019
@trinodb trinodb deleted a comment from cla-bot bot Feb 5, 2019
@vincentpoon
Copy link
Member Author

Thanks for the detailed review @findepi , just posting to give an update that I'm still working on a new revision.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpoon Thanks for contributing this. Please let us know when the new revision is ready. Feel free to message me on Slack if you have any questions.

pom.xml Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
@vincentpoon
Copy link
Member Author

Still working on this :)
As you know, I based this work off someone else's old pull request. From the review comments, I've come to realize that much of that code was copy+pasted from base-jdbc and other places. I'm trying to rewrite large portions to fix all of that.
Thanks for the patience. BTW, I appreciate all the comments - the codebase is a pleasure to work with due to diligence from maintainers like yourselves in keeping the code clean and sharp!

@combineads
Copy link

@vincentpoon
I am so sorry. I could not confirm my email due to personal circumstances. I have submitted a CLA.
Please feel free to ask me if I can help. Thank you for your effort.

@vincentpoon
Copy link
Member Author

@findepi @electrum @kokosing

I've pushed a new revision. Would appreciate you taking a looking.
It's a major overhaul which removed a lot of duplicate code and reuses code from base-jdbc. To give you an idea, the patch file line count dropped from ~5500 to ~3000. I also fixed up the patch with all the prior review suggestions.

At the core, what we're doing is pretty simple - generate Phoenix splits (HBase scans) and wrap them in a ResultSet, and from then on treat each split as if it were from a jdbc connector. I did have to drop a few things, though: 1) ARRAY type, which I hope to complete as a broader patch in #317, 2) DELETE, which doesn't appear to be implemented for JDBC yet, and 3) Phoenix ON DUPLICATE KEY support.

Other notables: Connection properties are now read from a separate hbase-site.xml, which is the way it's typically done in Phoenix deployments. The phoenix.rst is cleaned up - I shortened all the descriptions and provided a link instead, as all the properties would generally be familiar to Phoenix/HBase users and more information can be found on the Phoenix/HBase websites anyways.

Thanks!
FYI @lhofhansl

.travis.yml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
<dependencies>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>presto-phoenix-client-shaded</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

presto-phoenix/pom.xml Show resolved Hide resolved
@electrum electrum added the roadmap Top level issues for major efforts in the project label Mar 13, 2019
@vincentpoon
Copy link
Member Author

Thanks @kokosing , updated to address your comments

@combineads
Copy link

This phoenix connector occurs the below error when using 4.14.0-cdh5.14.2 phoenix version.

java.lang.AbstractMethodError: io.prestosql.plugin.jdbc.JdbcRecordSetProvider.getRecordSet(Lio/prestosql/spi/connector/ConnectorTransactionHandle;Lio/prestosql/spi/connector/ConnectorSession;Lio/prestosql/spi/connector/ConnectorSplit;Ljava/util/List;)Lio/prestosql/spi/connector/RecordSet;
	at io.prestosql.split.RecordPageSourceProvider.createPageSource(RecordPageSourceProvider.java:42)
	at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
	at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:239)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_306____20190329_083613_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)```

@Praveen2112
Copy link
Member

@combineads I guess you have to run it with the latest build of presto. I guess there is a recent changes added to the RecordSet API and old version of Presto (I assume this Presto-306 was running).

@dain
Copy link
Member

dain commented Apr 3, 2019

If this is ready for another review, can you resolve the conflict in the .travis.yml file, so travis will run?

@lhofhansl
Copy link
Member

lhofhansl commented Apr 6, 2019

I noticed now that when I add a table in Phoenix without a schema that SHOW SCHEMAS FROM <Phoenix-Catalog> throws an NPE.

Indeed BaseJdbcClient.getSchemaNames() expects that resultSet.getString("TABLE_SCHEM") always returns something non-null, which is not correct according to the JDBC documentation. (TABLE_SCHEM might be null).

java.lang.NullPointerException: undefined
	at io.prestosql.plugin.jdbc.BaseJdbcClient.getSchemaNames(BaseJdbcClient.java:129)
	at io.prestosql.plugin.phoenix.PhoenixClient.getSchemaNames(PhoenixClient.java:167)
	at io.prestosql.plugin.phoenix.PhoenixMetadata.listSchemaNames(PhoenixMetadata.java:102)
	at io.prestosql.metadata.MetadataManager.listSchemaNames(MetadataManager.java:313)
	at io.prestosql.connector.informationschema.InformationSchemaMetadata.calculatePrefixesWithSchemaName(InformationSchemaMetadata.java:277)
	at io.prestosql.connector.informationschema.InformationSchemaMetadata.getTableLayouts(InformationSchemaMetadata.java:241)
	at io.prestosql.metadata.MetadataManager.getLayout(MetadataManager.java:404)
	at io.prestosql.sql.planner.iterative.rule.PushPredicateIntoTableScan.pushFilterIntoTableScan(PushPredicateIntoTableScan.java:205)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.planTableScan(AddExchanges.java:540)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitTableScan(AddExchanges.java:507)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitTableScan(AddExchanges.java:133)
	at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:131)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.planChild(AddExchanges.java:1166)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitSort(AddExchanges.java:413)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitSort(AddExchanges.java:133)
	at io.prestosql.sql.planner.plan.SortNode.accept(SortNode.java:74)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.planChild(AddExchanges.java:1166)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput(AddExchanges.java:175)
	at io.prestosql.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput(AddExchanges.java:133)
	at io.prestosql.sql.planner.plan.OutputNode.accept(OutputNode.java:82)
	at io.prestosql.sql.planner.optimizations.AddExchanges.optimize(AddExchanges.java:129)
	at io.prestosql.sql.planner.optimizations.StatsRecordingPlanOptimizer.optimize(StatsRecordingPlanOptimizer.java:51)
	at io.prestosql.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:170)
	at io.prestosql.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:159)
	at io.prestosql.execution.SqlQueryExecution.doAnalyzeQuery(SqlQueryExecution.java:423)
	at io.prestosql.execution.SqlQueryExecution.analyzeQuery(SqlQueryExecution.java:408)
	at io.prestosql.execution.SqlQueryExecution.startExecution(SqlQueryExecution.java:352)
	at io.prestosql.$gen.Presto_307_dirty____20190406_173206_1.run(Unknown Source)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Locally I fixed it by trivially by copying the BaseJdbcClient.getSchemaNames() code into PhoenixClient (and add the DEFAULT_SCHEMA).

@vincentpoon

presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
presto-phoenix/src/test/resources/hbase-site.xml Outdated Show resolved Hide resolved
@kokosing
Copy link
Member

I would also consider to squash all the commits and open a new PR. This one already has more than 200 of comments and github is getting slow.

@vincentpoon
Copy link
Member Author

Pushed another update that addresses the comments. Also, this now uses the official phoenix-client, instead of the custom shaded jar we were producing for Presto. It required a few changes to the pom.xml
Multiple resource files for configuration is now supported as well.

@vincentpoon
Copy link
Member Author

Closed this since GitHub was loading slowly with this many comments. Continuing at #672

rice668 pushed a commit to rice668/trino that referenced this pull request Jan 31, 2023
rice668 added a commit to rice668/trino that referenced this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed roadmap Top level issues for major efforts in the project
Development

Successfully merging this pull request may close these issues.

10 participants