support PostgreSQL data source #62

ianhhhhhhhhe · 2022-01-22T02:15:56Z

spark_v3.0

(cherry picked from commit 8bdff07)

CLAassistant · 2022-01-22T02:16:04Z

All committers have signed the CLA.

wey-gu · 2022-01-24T03:26:01Z

Thanks so much @ianhhhhhhhhe for the amazing work ;-)

wey-gu · 2022-01-24T04:03:18Z

One test failed.

Results :

Failed tests:   configsSuite(scala.com.vesoft.nebula.exchange.config.ConfigsSuite): assertion failed

Tests run: 25, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for nebula-exchange 3.0-SNAPSHOT:
[INFO] 
[INFO] nebula-exchange .................................... SUCCESS [  0.076 s]
[INFO] exchange-common .................................... FAILURE [05:31 min]
[INFO] nebula-exchange_spark_2.2 .......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:31 min
[INFO] Finished at: 2022-01-24T03:31:48Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project exchange-common: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/nebula-exchange/nebula-exchange/exchange-common/target/surefire-reports for the individual test results.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :exchange-common
Error: Process completed with exit code 1.

fix postgresql config bug

codecov-commenter · 2022-01-24T10:01:55Z

Codecov Report

Merging #62 (0dd8046) into master (1fc042b) will increase coverage by 0.62%.
The diff coverage is 84.00%.

@@             Coverage Diff              @@
##             master      #62      +/-   ##
============================================
+ Coverage     54.51%   55.14%   +0.62%     
  Complexity       76       76              
============================================
  Files            17       17              
  Lines          1317     1342      +25     
  Branches        250      254       +4     
============================================
+ Hits            718      740      +22     
+ Misses          474      473       -1     
- Partials        125      129       +4

Impacted Files	Coverage Δ
...la/com/vesoft/exchange/common/config/Configs.scala	`66.66% <81.81%> (+0.51%)`	⬆️
.../vesoft/exchange/common/config/SourceConfigs.scala	`62.79% <85.71%> (+2.79%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fc042b...0dd8046. Read the comment docs.

Nicole00 · 2022-01-25T06:37:10Z

exchange-common/src/test/scala/com/vesoft/exchange/common/config/ConfigsSuite.scala

+          assert(postgresql.port == 5432)
+          assert(postgresql.user.equals("root"))
+          assert(postgresql.password.equals("nebula"))
+          assert(postgresql.database.equals("database"))


duplicate test

Nicole00 · 2022-01-25T06:37:30Z

nebula-exchange_spark_3.0/src/main/scala/com/vesoft/nebula/exchange/Exchange.scala

-  PulsarReader
-}
+import com.vesoft.exchange.common.config.{ClickHouseConfigEntry, Configs, DataSourceConfigEntry, FileBaseSourceConfigEntry, HBaseSourceConfigEntry, HiveSourceConfigEntry, JanusGraphSourceConfigEntry, KafkaSourceConfigEntry, MaxComputeConfigEntry, MySQLSourceConfigEntry, Neo4JSourceConfigEntry, PostgresSQLSourceConfigEntry, PulsarSourceConfigEntry, SinkCategory, SourceCategory}
+import com.vesoft.nebula.exchange.reader.{CSVReader, ClickhouseReader, HBaseReader, HiveReader, JSONReader, JanusGraphReader, KafkaReader, MaxcomputeReader, MySQLReader, Neo4JReader, ORCReader, ParquetReader, PostgreSQLReader, PulsarReader}


please format the import

Nicole00 · 2022-01-25T06:40:50Z

...a-exchange_spark_3.0/src/main/scala/com/vesoft/nebula/exchange/reader/ServerBaseReader.scala

-  Neo4JSourceConfigEntry,
-  ServerDataSourceConfigEntry
-}
+import com.vesoft.exchange.common.config.{ClickHouseConfigEntry, HBaseSourceConfigEntry, HiveSourceConfigEntry, JanusGraphSourceConfigEntry, MaxComputeConfigEntry, MySQLSourceConfigEntry, Neo4JSourceConfigEntry, PostgresSQLSourceConfigEntry, ServerDataSourceConfigEntry}


please format the import

Nicole00 · 2022-01-25T06:43:39Z

...a-exchange_spark_3.0/src/main/scala/com/vesoft/nebula/exchange/reader/ServerBaseReader.scala

+      .option("user", postgreConfig.user)
+      .option("password", postgreConfig.password)
+      .load()
+    df.createOrReplaceTempView(postgreConfig.table)


How do you test the postgresql datasource? If your table is configed as table but the table name in sentence is db.table, can this sentence be executed successfully?

I tested with

df.createOrReplaceTempView("table") session.sql("select * from db.table")

and got exceptions like :

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `db`.`table`; line 1 pos 14; 'Project [*] +- 'UnresolvedRelation `db`.`table`

So, please change the table name in config sentence to keep the same with config table in config file.

Nicole00 · 2022-01-25T06:50:09Z

exchange-common/src/main/scala/com/vesoft/exchange/common/config/SourceConfigs.scala

+ * @param password
+ * @param sentence
+ */
+case class PostgresSQLSourceConfigEntry(override val category: SourceCategory.Value,


should we refactor PostgresSQLSourceConfigEntry to PostgreSQLSourceConfigEntry?

Nicole00 · 2022-01-25T07:15:42Z

can you also add postgresql datasource for spark2.2 and spark2.4 ?

Nicole00 · 2022-01-25T08:06:25Z

add postgresql dependency to avoid exception:

Exception in thread "main" java.sql.SQLException: No suitable driver
	at java.sql.DriverManager.getDriver(DriverManager.java:315)

available postgresql versions are: 9.4.1207 - 9.4.1212, for example:

<dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>9.4.1212</version>
</dependency>

…n test, declare driver in pom

ianhhhhhhhhe · 2022-01-25T14:20:36Z

can you also add postgresql datasource for spark2.2 and spark2.4 ?

Sure ^ - ^

Nicole00 · 2022-01-27T01:58:48Z

...a-exchange_spark_2.2/src/main/scala/com/vesoft/nebula/exchange/reader/ServerBaseReader.scala

+      .option("password", postgreConfig.password)
+      .load()
+    df.createOrReplaceTempView(postgreConfig.table)
+    session.sql(sentence)


in configs, if user does not config sentence, default sentence value is "", which will throw park.sql.catalyst.parser.ParseException when execute session.sql(sentence)

So, please add check for session.sql:

if(!"".equals(sentence)) session.sql(sentence)

What if I check it when user set the configs ?

require( host.trim.length != 0 && port > 0 && database.trim.length > 0 && table.trim.length > 0 && user.trim.length > 0 && sentence.trim.length > 0 )

It will save many if

It's contradictory with the default sentence value:

getOrElse(config, "sentence", "")

for default sentence value, I prefer to use null but not ""

for users, they are allowed to not config the sentence, so we still need to process the two situations for sentence.

If sentence is not configured, executing select * from ${table} might be better.

If sentence is not configured, executing select * from ${table} might be better.

No need to execute select * from table, default read from db for sparksql is reading all data.

Nicole00 · 2022-01-27T01:59:02Z

...a-exchange_spark_2.4/src/main/scala/com/vesoft/nebula/exchange/reader/ServerBaseReader.scala

+      .option("password", postgreConfig.password)
+      .load()
+    df.createOrReplaceTempView(postgreConfig.table)
+    session.sql(sentence)


Nicole00 · 2022-01-27T01:59:21Z

...a-exchange_spark_3.0/src/main/scala/com/vesoft/nebula/exchange/reader/ServerBaseReader.scala

+      .option("password", postgreConfig.password)
+      .load()
+    df.createOrReplaceTempView(postgreConfig.table)
+    session.sql(sentence)


… set

Nicole00

Gread work, thanks~

ianhhhhhhhhe added 2 commits January 22, 2022 00:47

feat(config, reader): add postgresql reader

3ff17b5

feat(postgresql config): remove requirement of schema

6234df9

(cherry picked from commit 8bdff07)

wey-gu requested a review from Nicole00 January 24, 2022 03:24

ianhhhhhhhhe and others added 2 commits January 24, 2022 16:44

Update application.conf

bb05d90

fix postgresql config bug

Config postgresql support

244d854

Nicole00 reviewed Jan 25, 2022

View reviewed changes

fix(reader, pom, test): update driver config, remove db requirement i…

17dc0d1

…n test, declare driver in pom

ianhhhhhhhhe added 2 commits January 26, 2022 21:25

feat(2.4): add postgresql driver support

d8ae6e9

feat(2.2): add postgresql driver support

a321d75

Nicole00 reviewed Jan 27, 2022

View reviewed changes

feat(PostgreSQLReader): execute 'select * from ' when sentence is not…

0dd8046

… set

Nicole00 approved these changes Jan 27, 2022

View reviewed changes

Nicole00 merged commit 2559667 into vesoft-inc:master Jan 27, 2022

Nicole00 added the doc affected PR: improvements or additions to documentation label Jan 27, 2022

jamieliu1023 mentioned this pull request Jan 29, 2022

Weekly Report 2022-01-28 vesoft-inc/nebula-community#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support PostgreSQL data source #62

support PostgreSQL data source #62

ianhhhhhhhhe commented Jan 22, 2022 •

edited

Loading

CLAassistant commented Jan 22, 2022 •

edited

Loading

wey-gu commented Jan 24, 2022

wey-gu commented Jan 24, 2022

codecov-commenter commented Jan 24, 2022 •

edited

Loading

Nicole00 Jan 25, 2022

Nicole00 Jan 25, 2022

Nicole00 Jan 25, 2022

Nicole00 Jan 25, 2022

Nicole00 Jan 25, 2022

Nicole00 Jan 25, 2022

Nicole00 commented Jan 25, 2022

Nicole00 commented Jan 25, 2022

ianhhhhhhhhe commented Jan 25, 2022

Nicole00 Jan 27, 2022 •

edited

Loading

ianhhhhhhhhe Jan 27, 2022

Nicole00 Jan 27, 2022

ianhhhhhhhhe Jan 27, 2022 •

edited

Loading

Nicole00 Jan 27, 2022

Nicole00 Jan 27, 2022

Nicole00 Jan 27, 2022

Nicole00 left a comment

support PostgreSQL data source #62

support PostgreSQL data source #62

Conversation

ianhhhhhhhhe commented Jan 22, 2022 • edited Loading

CLAassistant commented Jan 22, 2022 • edited Loading

wey-gu commented Jan 24, 2022

wey-gu commented Jan 24, 2022

codecov-commenter commented Jan 24, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nicole00 commented Jan 25, 2022

Nicole00 commented Jan 25, 2022

ianhhhhhhhhe commented Jan 25, 2022

Nicole00 Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianhhhhhhhhe Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nicole00 left a comment

Choose a reason for hiding this comment

ianhhhhhhhhe commented Jan 22, 2022 •

edited

Loading

CLAassistant commented Jan 22, 2022 •

edited

Loading

codecov-commenter commented Jan 24, 2022 •

edited

Loading

Nicole00 Jan 27, 2022 •

edited

Loading

ianhhhhhhhhe Jan 27, 2022 •

edited

Loading