-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-6859 Update phoenix5-spark3 README with PySpark code references #92
base: master
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@virajjasani @tkhurana Please review the PR. |
@stoty you might also be interested in this |
Thanks for the PR @Abhey |
Would it be possible to add the relevant changes to the Spark2 README ? |
|
||
## Configuring Spark to use the connector | ||
The phoenix5-spark3 plugin extends Phoenix's MapReduce support to allow Spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the we use "Plugin" on the website, but we should standardize on "Connector"
|
||
In contrast, the phoenix-spark integration is able to leverage the underlying | ||
splits provided by Phoenix in order to retrieve and save data across multiple | ||
workers. All that’s required is a database URL and a table name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use "select statement" instead of "table name" ?
splits provided by Phoenix in order to retrieve and save data across multiple | ||
workers. All that’s required is a database URL and a table name. | ||
Optional SELECT columns can be given, | ||
as well as pushdown predicates for efficient filtering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like you need tospecify the pushdown predicates.
Can you rephrase so that it's apparent that pushdown is automatic ?
as well as pushdown predicates for efficient filtering. | ||
|
||
The choice of which method to use to access | ||
Phoenix comes down to each specific use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
This is super important, and we should have much more on this (though not necessarily in this ticket)
|
||
## Setup | ||
|
||
To setup connector add `phoenix5-spark3-shaded` JAR as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, you don't want to add the connector to the maven/compile classpath, it tends to cause conflicts when upgrading.
We should move this to emd of the section, and add the caveat that this is only needed for the deprecated usages.
The choice of which method to use to access | ||
Phoenix comes down to each specific use case. | ||
|
||
## Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this assumes that Phoenix and HBase/Spark are are both present and configured on the same nodes.
Maybe worth mentioning it ?
Scala example: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't touch that part, but do we still need the SparkContext import ?
Scala example: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add comments to make it obvious that you need to use a real ZK quorum , like
//replace "phoenix-server:2181" with the real ZK quorum
Thanks, @stoty for reviewing the PR. I will take a look at your review comments and address the same. |
@stoty One question for you, I created a Docker Image with all the Pre-Requisites for testing the Phoenix-Spark connector. Do you think it's a good idea to add the reference to the same in the official documentation? Repository Link - |
This PR introduces the changes for adding the PySpark code references in the phoenix5-spark3 connector README.