New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PHOENIX-6859 Update phoenix5-spark3 README with PySpark code references #92

Open

Abhey wants to merge 6 commits into apache:master from Abhey:PHOENIX-6859

+155 −36

Abhey commented Jan 23, 2023

This PR introduces the changes for adding the PySpark code references in the phoenix5-spark3 connector README.

Abhey added 2 commits

January 23, 2023 18:34


          Update phoenix5-spark3 README with PySpark references

f79af31


          Minor code fixes in the 'ave from an external RDD with a schema to a …

553dd47

…Phoenix table' section

Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	6m 18s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
		_ Patch Compile Tests _
-1 ❌	markdownlint	0m 1s	The patch generated 30 new + 50 unchanged - 4 fixed = 80 total (was 54)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
		_ Other Tests _
+1 💚	asflicense	0m 31s	The patch does not generate ASF License warnings.
		7m 18s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#92
Optional Tests	dupname asflicense markdownlint
uname	Linux 8a914290e3a8 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev/phoenix-connectors-personality.sh
git revision	master / `4da44c7`
markdownlint	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
Max. process+thread count	47 (vs. ulimit of 30000)
modules	C: phoenix5-spark3 U: phoenix5-spark3
Console output	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/1/console
versions	git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.


          Add missing filtering statementes

6ed80fa

Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 3s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
		_ Patch Compile Tests _
-1 ❌	markdownlint	0m 2s	The patch generated 33 new + 46 unchanged - 8 fixed = 79 total (was 54)
-1 ❌	whitespace	0m 0s	The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
		_ Other Tests _
+1 💚	asflicense	0m 31s	The patch does not generate ASF License warnings.
		2m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#92
Optional Tests	dupname asflicense markdownlint
uname	Linux fdceb27a730b 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev/phoenix-connectors-personality.sh
git revision	master / `4da44c7`
markdownlint	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
whitespace	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	47 (vs. ulimit of 30000)
modules	C: phoenix5-spark3 U: phoenix5-spark3
Console output	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/2/console
versions	git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.


          Markdownn liniting fixes

5dd6f76

Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 4s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
		_ Patch Compile Tests _
-1 ❌	markdownlint	0m 2s	The patch generated 10 new + 28 unchanged - 26 fixed = 38 total (was 54)
-1 ❌	whitespace	0m 0s	The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
		_ Other Tests _
+1 💚	asflicense	0m 30s	The patch does not generate ASF License warnings.
		2m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#92
Optional Tests	dupname asflicense markdownlint
uname	Linux fb3451f06b7c 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev/phoenix-connectors-personality.sh
git revision	master / `4da44c7`
markdownlint	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
whitespace	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	47 (vs. ulimit of 30000)
modules	C: phoenix5-spark3 U: phoenix5-spark3
Console output	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/3/console
versions	git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.


          Markdown linting fixes

64bce1b

Contributor

stoty commented Jan 23, 2023

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 6s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
		_ Patch Compile Tests _
-1 ❌	markdownlint	0m 1s	The patch generated 3 new + 19 unchanged - 35 fixed = 22 total (was 54)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
		_ Other Tests _
+1 💚	asflicense	0m 31s	The patch does not generate ASF License warnings.
		2m 4s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#92
Optional Tests	dupname asflicense markdownlint
uname	Linux 9e5b71556a5d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev/phoenix-connectors-personality.sh
git revision	master / `4da44c7`
markdownlint	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/artifact/yetus-general-check/output/diff-patch-markdownlint.txt
Max. process+thread count	47 (vs. ulimit of 30000)
modules	C: phoenix5-spark3 U: phoenix5-spark3
Console output	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/4/console
versions	git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.


          Markdown final fixes

1521ce2

Contributor

stoty commented Jan 23, 2023

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 0s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
		_ Patch Compile Tests _
+1 💚	markdownlint	0m 2s	The patch generated 0 new + 15 unchanged - 39 fixed = 15 total (was 54)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
		_ Other Tests _
+1 💚	asflicense	0m 30s	The patch does not generate ASF License warnings.
		1m 59s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#92
Optional Tests	dupname asflicense markdownlint
uname	Linux 822a16ddacb3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev/phoenix-connectors-personality.sh
git revision	master / `4da44c7`
Max. process+thread count	47 (vs. ulimit of 30000)
modules	C: phoenix5-spark3 U: phoenix5-spark3
Console output	https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-Connectors-PreCommit-GitHub-PR/job/PR-92/5/console
versions	git=2.7.4 maven=3.3.9 markdownlint=0.22.0
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Author

Abhey commented Jan 23, 2023

@virajjasani @tkhurana Please review the PR.

virajjasani requested a review from stoty

January 23, 2023 18:04

virajjasani commented Jan 23, 2023

@stoty you might also be interested in this

virajjasani commented Jan 23, 2023

Thanks for the PR @Abhey

virajjasani self-requested a review

January 23, 2023 18:04

Contributor

stoty commented Jan 23, 2023

Would it be possible to add the relevant changes to the Spark2 README ?

stoty requested changes

View reviewed changes

phoenix5-spark3/README.md


		## Configuring Spark to use the connector
		The phoenix5-spark3 plugin extends Phoenix's MapReduce support to allow Spark

Contributor

stoty Jan 23, 2023

I realize the we use "Plugin" on the website, but we should standardize on "Connector"

phoenix5-spark3/README.md

+              In contrast, the phoenix-spark integration is able to leverage the underlying
+               splits provided by Phoenix in order to retrieve and save data across multiple
+               workers. All that’s required is a database URL and a table name.

Contributor

stoty Jan 23, 2023

use "select statement" instead of "table name" ?

phoenix5-spark3/README.md

+               splits provided by Phoenix in order to retrieve and save data across multiple
+               workers. All that’s required is a database URL and a table name.
+               Optional SELECT columns can be given,
+               as well as pushdown predicates for efficient filtering.

Contributor

stoty Jan 23, 2023

This sounds like you need tospecify the pushdown predicates.
Can you rephrase so that it's apparent that pushdown is automatic ?

phoenix5-spark3/README.md

+               as well as pushdown predicates for efficient filtering.
+              The choice of which method to use to access
+               Phoenix comes down to each specific use case.

Contributor

stoty Jan 23, 2023

nit:
This is super important, and we should have much more on this (though not necessarily in this ticket)

phoenix5-spark3/README.md


		## Setup

		To setup connector add `phoenix5-spark3-shaded` JAR as

Contributor

stoty Jan 23, 2023

In most cases, you don't want to add the connector to the maven/compile classpath, it tends to cause conflicts when upgrading.

We should move this to emd of the section, and add the caveat that this is only needed for the deprecated usages.

phoenix5-spark3/README.md

+              The choice of which method to use to access
+               Phoenix comes down to each specific use case.
+              ## Setup

Contributor

stoty Jan 23, 2023

Nit: this assumes that Phoenix and HBase/Spark are are both present and configured on the same nodes.
Maybe worth mentioning it ?

phoenix5-spark3/README.md

		Scala example:

Contributor

stoty Jan 23, 2023

I know you didn't touch that part, but do we still need the SparkContext import ?

phoenix5-spark3/README.md

		Scala example:

Contributor

stoty Jan 23, 2023

Maybe add comments to make it obvious that you need to use a real ZK quorum , like
//replace "phoenix-server:2181" with the real ZK quorum

Author

Abhey commented Jan 30, 2023

Thanks, @stoty for reviewing the PR. I will take a look at your review comments and address the same.

Author

Abhey commented Jan 30, 2023

@stoty One question for you, I created a Docker Image with all the Pre-Requisites for testing the Phoenix-Spark connector. Do you think it's a good idea to add the reference to the same in the official documentation?

Repository Link -

https://github.com/Abhey/phoenix-spark-connector-demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet