Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6859 Update phoenix5-spark3 README with PySpark code references #92

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Abhey
Copy link

@Abhey Abhey commented Jan 23, 2023

This PR introduces the changes for adding the PySpark code references in the phoenix5-spark3 connector README.

@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 6m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
_ Patch Compile Tests _
-1 ❌ markdownlint 0m 1s The patch generated 30 new + 50 unchanged - 4 fixed = 80 total (was 54)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
_ Other Tests _
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
7m 18s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #92
Optional Tests dupname asflicense markdownlint
uname Linux 8a914290e3a8 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev/phoenix-connectors-personality.sh
git revision master / 4da44c7
markdownlint https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
Max. process+thread count 47 (vs. ulimit of 30000)
modules C: phoenix5-spark3 U: phoenix5-spark3
Console output https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/console
versions git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
_ Patch Compile Tests _
-1 ❌ markdownlint 0m 2s The patch generated 33 new + 46 unchanged - 8 fixed = 79 total (was 54)
-1 ❌ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
_ Other Tests _
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
2m 2s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #92
Optional Tests dupname asflicense markdownlint
uname Linux fdceb27a730b 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev/phoenix-connectors-personality.sh
git revision master / 4da44c7
markdownlint https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
whitespace https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 47 (vs. ulimit of 30000)
modules C: phoenix5-spark3 U: phoenix5-spark3
Console output https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/console
versions git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
_ Patch Compile Tests _
-1 ❌ markdownlint 0m 2s The patch generated 10 new + 28 unchanged - 26 fixed = 38 total (was 54)
-1 ❌ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
_ Other Tests _
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
2m 2s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #92
Optional Tests dupname asflicense markdownlint
uname Linux fb3451f06b7c 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev/phoenix-connectors-personality.sh
git revision master / 4da44c7
markdownlint https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
whitespace https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 47 (vs. ulimit of 30000)
modules C: phoenix5-spark3 U: phoenix5-spark3
Console output https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/console
versions git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
_ Patch Compile Tests _
-1 ❌ markdownlint 0m 1s The patch generated 3 new + 19 unchanged - 35 fixed = 22 total (was 54)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
_ Other Tests _
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
2m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #92
Optional Tests dupname asflicense markdownlint
uname Linux 9e5b71556a5d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev/phoenix-connectors-personality.sh
git revision master / 4da44c7
markdownlint https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
Max. process+thread count 47 (vs. ulimit of 30000)
modules C: phoenix5-spark3 U: phoenix5-spark3
Console output https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/console
versions git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
_ Patch Compile Tests _
+1 💚 markdownlint 0m 2s The patch generated 0 new + 15 unchanged - 39 fixed = 15 total (was 54)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
_ Other Tests _
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
1m 59s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #92
Optional Tests dupname asflicense markdownlint
uname Linux 822a16ddacb3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev/phoenix-connectors-personality.sh
git revision master / 4da44c7
Max. process+thread count 47 (vs. ulimit of 30000)
modules C: phoenix5-spark3 U: phoenix5-spark3
Console output https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/5/console
versions git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Abhey
Copy link
Author

Abhey commented Jan 23, 2023

@virajjasani @tkhurana Please review the PR.

@virajjasani
Copy link

@stoty you might also be interested in this

@virajjasani
Copy link

Thanks for the PR @Abhey

@virajjasani virajjasani self-requested a review January 23, 2023 18:04
@stoty
Copy link
Contributor

stoty commented Jan 23, 2023

Would it be possible to add the relevant changes to the Spark2 README ?


## Configuring Spark to use the connector
The phoenix5-spark3 plugin extends Phoenix's MapReduce support to allow Spark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the we use "Plugin" on the website, but we should standardize on "Connector"


In contrast, the phoenix-spark integration is able to leverage the underlying
splits provided by Phoenix in order to retrieve and save data across multiple
workers. All that’s required is a database URL and a table name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use "select statement" instead of "table name" ?

splits provided by Phoenix in order to retrieve and save data across multiple
workers. All that’s required is a database URL and a table name.
Optional SELECT columns can be given,
as well as pushdown predicates for efficient filtering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like you need tospecify the pushdown predicates.
Can you rephrase so that it's apparent that pushdown is automatic ?

as well as pushdown predicates for efficient filtering.

The choice of which method to use to access
Phoenix comes down to each specific use case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
This is super important, and we should have much more on this (though not necessarily in this ticket)


## Setup

To setup connector add `phoenix5-spark3-shaded` JAR as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, you don't want to add the connector to the maven/compile classpath, it tends to cause conflicts when upgrading.

We should move this to emd of the section, and add the caveat that this is only needed for the deprecated usages.

The choice of which method to use to access
Phoenix comes down to each specific use case.

## Setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this assumes that Phoenix and HBase/Spark are are both present and configured on the same nodes.
Maybe worth mentioning it ?

Scala example:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't touch that part, but do we still need the SparkContext import ?

Scala example:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comments to make it obvious that you need to use a real ZK quorum , like
//replace "phoenix-server:2181" with the real ZK quorum

@Abhey
Copy link
Author

Abhey commented Jan 30, 2023

Thanks, @stoty for reviewing the PR. I will take a look at your review comments and address the same.

@Abhey
Copy link
Author

Abhey commented Jan 30, 2023

@stoty One question for you, I created a Docker Image with all the Pre-Requisites for testing the Phoenix-Spark connector. Do you think it's a good idea to add the reference to the same in the official documentation?

Repository Link -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants