Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix version validation #88

Merged
merged 4 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 57 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Nebula Spark Connector
# NebulaGraph Spark Connector
[中文版](https://github.com/vesoft-inc/nebula-spark-connector/blob/master/README_CN.md)

## Introduction

Nebula Spark Connector 2.0/3.0 only supports Nebula Graph 2.x/3.x. If you are using Nebula Graph v1.x, please use [Nebula Spark Connector v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/nebula-spark) .
NebulaGraph Spark Connector 2.0/3.0 only supports NebulaGraph 2.x/3.x. If you are using NebulaGraph v1.x, please use [NebulaGraph Spark Connector v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/nebula-spark) .

Nebula Spark Connector support spark 2.2 and 2.4.
NebulaGraph Spark Connector support spark 2.2 and 2.4.

## How to Compile

1. Package Nebula Spark Connector.
1. Package NebulaGraph Spark Connector.

```bash
$ git clone https://github.com/vesoft-inc/nebula-spark-connector.git
Expand All @@ -24,27 +24,43 @@ Nebula Spark Connector support spark 2.2 and 2.4.

After the packaging, you can see the newly generated nebula-spark-connector-3.0-SNAPSHOT.jar under the nebula-spark-connector/nebula-spark-connector/target/ directory.

## New Features (Compared to Nebula Spark Connector 1.0)
## New Features (Compared to NebulaGraph Spark Connector 1.0)
* Supports more connection configurations, such as timeout, connectionRetry, and executionRetry.
* Supports more data configurations, such as whether vertexId can be written as vertex's property, whether srcId, dstId and rank can be written as edge's properties.
* Spark Reader Supports non-property, all-property, and specific-properties read.
* Spark Reader Supports reading data from Nebula Graph to Graphx as VertexRD and EdgeRDD, it also supports String type vertexId.
* Nebula Spark Connector 2.0 uniformly uses SparkSQL's DataSourceV2 for data source expansion.
* Nebula Spark Connector 2.1.0 support UPDATE write mode to NebulaGraph, see [Update Vertex](https://docs.nebula-graph.io/2.0.1/3.ngql-guide/12.vertex-statements/2.update-vertex/) .
* Nebula Spark Connector 2.5.0 support DELETE write mode to NebulaGraph, see [Delete Vertex](https://docs.nebula-graph.io/master/3.ngql-guide/12.vertex-statements/4.delete-vertex/)
* Spark Reader Supports reading data from NebulaGraph to Graphx as VertexRD and EdgeRDD, it also supports String type vertexId.
* NebulaGraph Spark Connector 2.0 uniformly uses SparkSQL's DataSourceV2 for data source expansion.
* NebulaGraph Spark Connector 2.1.0 support UPDATE write mode to NebulaGraph, see [Update Vertex](https://docs.nebula-graph.io/2.0.1/3.ngql-guide/12.vertex-statements/2.update-vertex/) .
* NebulaGraph Spark Connector 2.5.0 support DELETE write mode to NebulaGraph, see [Delete Vertex](https://docs.nebula-graph.io/master/3.ngql-guide/12.vertex-statements/4.delete-vertex/)

## How to Use

If you use Maven to manage your project, add the following dependency to your pom.xml:
If you use Maven to manage your project, add one of the following dependency to your pom.xml:

```
<!-- connector for spark 2.4 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>

<!-- connector for spark 2.2 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_2.2</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>

<!-- connector for spark 3.0 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_3.0</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
```

Write DataFrame `INSERT` into Nebula Graph as Vertices:
Write DataFrame `INSERT` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -61,7 +77,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Write DataFrame `UPDATE` into Nebula Graph as Vertices:
Write DataFrame `UPDATE` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -79,7 +95,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Write DataFrame `DELETE` into Nebula Graph as Vertices:
Write DataFrame `DELETE` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -96,7 +112,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Read vertices from Nebula Graph:
Read vertices from NebulaGraph:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -115,7 +131,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
val vertex = spark.read.nebula(config, nebulaReadVertexConfig).loadVerticesToDF()
```

Read vertices and edges from Nebula Graph to construct Graphx's graph:
Read vertices and edges from NebulaGraph to construct Graphx's graph:
```
val config = NebulaConnectionConfig
.builder()
Expand Down Expand Up @@ -148,7 +164,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.

For more information on usage, please refer to [Example](https://github.com/vesoft-inc/nebula-spark-connector/tree/master/example/src/main/scala/com/vesoft/nebula/examples/connector).

## PySpark with Nebula Spark Connector
## PySpark with NebulaGraph Spark Connector

Below is an example of calling nebula-spark-connector jar package in pyspark.

Expand Down Expand Up @@ -276,7 +292,7 @@ For more options, i.e. delete edge with vertex being deleted, refer to [nebula/c
val DELETE_EDGE: String = "deleteEdge"
```

### Call Nebula Spark Connector in PySpark shell and .py file
### Call NebulaGraph Spark Connector in PySpark shell and .py file

Also, below are examples on how we run above code with pyspark shell or in python code files:

Expand Down Expand Up @@ -307,21 +323,27 @@ df = spark.read.format(
"partitionNumber", 1).load()
```

## Version match

There are the version correspondence between Nebula Spark Connector and Nebula:

| Nebula Spark Connector Version | Nebula Version |
|:------------------------------:|:--------------:|
| 2.0.0 | 2.0.0, 2.0.1 |
| 2.0.1 | 2.0.0, 2.0.1 |
| 2.1.0 | 2.0.0, 2.0.1 |
| 2.5.0 | 2.5.0, 2.5.1 |
| 2.5.1 | 2.5.0, 2.5.1 |
| 2.6.0 | 2.6.0, 2.6.1 |
| 2.6.1 | 2.6.0, 2.6.1 |
| 3.0.0 | 3.0.x, 3.1.x |
| 3.0-SNAPSHOT | nightly |
## Compatibility matrix

There are the version correspondence between NebulaGraph Spark Connector and Nebula、Spark:

| NebulaGraph Spark Connector Version | NebulaGraph Version | Spark Version |
|:-----------------------------------------:|:--------------:|:-------------:|
|nebula-spark-connector-2.0.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.0.1.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.1.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.5.0.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.5.1.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.6.0.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-2.6.1.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-3.0.0.jar | 3.x | 2.4.* |
|nebula-spark-connector-3.3.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.3.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.4.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.4.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.0-SNAPSHOT.jar | nightly | 2.4.* |
|nebula-spark-connector_2.2-3.0-SNAPSHOT.jar| nightly | 2.2.* |
|nebula-spark-connector_3.0-3.0-SNAPSHOT.jar| nightly | 3.* |

## Performance
We use LDBC dataset to test nebula-spark-connector's performance, here's the result.
Expand All @@ -332,7 +354,7 @@ We choose tag Comment and edge REPLY_OF for space sf30 and sf100 to test the con
And the application's resources are: standalone mode with three workers, 2G driver-memory,
3 num-executors, 30G executor-memory and 20 executor-cores.
The ReadNebulaConfig has 2000 limit and 100 partitionNum,
the same partition number with nebula space parts.
the same partition number with NebulaGraph space parts.


|data type|ldbc 67.12million with No Property| ldbc 220 million with No Property|ldbc 67.12million with All Property|ldbc 220 million with All Property|
Expand Down Expand Up @@ -360,8 +382,8 @@ The writeConfig has 2000 batch sizes, and the DataFrame has 60 partitions.

## How to Contribute

Nebula Spark Connector is a completely opensource project, opensource enthusiasts are welcome to participate in the following ways:
NebulaGraph Spark Connector is a completely opensource project, opensource enthusiasts are welcome to participate in the following ways:

- Go to [Nebula Graph Forum](https://discuss.nebula-graph.com.cn/ "go to“Nebula Graph Forum") to discuss with other users. You can raise your own questions, help others' problems, share your thoughts.
- Go to [NebulaGraph Forum](https://discuss.nebula-graph.com.cn/ "go to“NebulaGraph Forum") to discuss with other users. You can raise your own questions, help others' problems, share your thoughts.
- Write or improve documents.
- Submit code to add new features or fix bugs.
50 changes: 36 additions & 14 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,28 @@ Nebula Spark Connector 支持 Spark 2.2 和 2.4.
* Nebula Spark Connector 2.5.0 增加了 DELETE 写入模式,相关说明参考[Delete Vertex](https://docs.nebula-graph.com.cn/2.5.1/3.ngql-guide/12.vertex-statements/4.delete-vertex/)

## 使用说明
如果你使用Maven管理项目,请在pom.xml文件中增加依赖:
如果你使用Maven管理项目,请在pom.xml文件中增加下列某一项依赖:
```
<!-- connector for spark 2.4 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>

<!-- connector for spark 2.2 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_2.2</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>

<!-- connector for spark 3.0 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_3.0</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
```

将 DataFrame 作为点 `INSERT` 写入 Nebula Graph :
Expand Down Expand Up @@ -305,19 +320,26 @@ df = spark.read.format(
```

## 版本匹配
Nebula Spark Connector 和 Nebula 的版本对应关系如下:

| Nebula Spark Connector Version | Nebula Version |
|:------------------------------:|:--------------:|
| 2.0.0 | 2.0.0, 2.0.1 |
| 2.0.1 | 2.0.0, 2.0.1 |
| 2.1.0 | 2.0.0, 2.0.1 |
| 2.5.0 | 2.5.0, 2.5.1 |
| 2.5.1 | 2.5.0, 2.5.1 |
| 2.6.0 | 2.6.0, 2.6.1 |
| 2.6.1 | 2.6.0, 2.6.1 |
| 3.0.0 | 3.0.0 |
| 3.0-SNAPSHOT | nightly |
Nebula Spark Connector 和 Nebula 、Spark 的版本对应关系如下:
Nicole00 marked this conversation as resolved.
Show resolved Hide resolved

| Nebula Spark Connector Version | Nebula Version | Spark Version |
|:-----------------------------------------:|:--------------:|:-------------:|
|nebula-spark-connector-2.0.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.0.1.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.1.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.5.0.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.5.1.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.6.0.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-2.6.1.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-3.0.0.jar | 3.x | 2.4.* |
|nebula-spark-connector-3.3.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.3.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.4.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.4.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.0-SNAPSHOT.jar | nightly | 2.4.* |
|nebula-spark-connector_2.2-3.0-SNAPSHOT.jar| nightly | 2.2.* |
|nebula-spark-connector_3.0-3.0-SNAPSHOT.jar| nightly | 3.* |


## 性能
我们使用LDBC数据集进行Nebula-Spark-Connector的性能测试,测试结果如下:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,8 @@ object ReadNebulaConfig {
assert(space != null && !space.isEmpty, s"config space is empty.")
assert(label != null && !label.isEmpty, s"config label is empty.")
assert(limit > 0, s"config limit must be positive, your limit is $limit")
assert(partitionNum > 0, s"config partitionNum must be positive, your partitionNum is $limit")
assert(partitionNum > 0,
s"config partitionNum must be positive, your partitionNum is $partitionNum")
if (noColumn && returnCols.nonEmpty) {
LOG.warn(
s"noColumn is true, returnCols will be invalidate "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ object SparkValidate {
val sparkVersion = SparkSession.getActiveSession.map(_.version).getOrElse("UNKNOWN")
if (sparkVersion != "UNKNOWN" && !supportedVersions.exists(sparkVersion.matches)) {
throw new RuntimeException(
s"""Your current spark version ${sparkVersion} is not supported by the current NebulaGraph Exchange.
| please visit https://github.com/vesoft-inc/nebula-exchange#version-match to know which Exchange you need.
s"""Your current spark version ${sparkVersion} is not supported by the current NebulaGraph Spark Connector.
| please visit https://github.com/vesoft-inc/nebula-spark-connector#version-match to know which Connector you need.
| """.stripMargin)
}
}
Expand Down