From ba4b4a62bb9f392130727f235420ddb60c22b0ee Mon Sep 17 00:00:00 2001 From: Anqi Date: Sun, 25 Jun 2023 11:49:46 +0800 Subject: [PATCH 1/4] add note for version restriction --- README-CN.md | 2 ++ README.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/README-CN.md b/README-CN.md index a4d8dcab..3742ff27 100644 --- a/README-CN.md +++ b/README-CN.md @@ -9,6 +9,8 @@ Exchange 仅支持 Nebula Graph 2.x 和 3.x。 Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包名分别是 nebula-exchange_spark_2.2,nebula-exchange_spark_2.4,nebula-exchange_spark_3.0。 +> 注意:3.4.0版本不支持 kafka 和 pulsar, 若需将 kafka 或 pulsar 数据导入 NebulaGraph,请使用 3.0.0 或 3.3.0 或 3.5.0 版本。 + ## 如何获取 1. 编译打包最新的 Exchange。 diff --git a/README.md b/README.md index 2d1bc25d..c00f10bb 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,8 @@ If you want to import data for Nebula Graph v1.x,please use [Nebula Exchange v Exchange currently supports spark2.2, spark2.4 and spark3.0, and the corresponding toolkits are nebula-exchange_spark_2.2, nebula-exchange_spark_2.4, nebula-exchange_spark_3.0. +> note: 3.4.0 version does not support kafka and pulsar. Please use 3.0.0 or 3.3.0 or 3.5.0 to import Kafka/pulsar into NebulaGraph. + ## How to get 1. Package latest Exchange From f49c2d4b141d10f9de66984e292496caedcee487 Mon Sep 17 00:00:00 2001 From: Wey Gu Date: Sun, 25 Jun 2023 12:53:37 +0800 Subject: [PATCH 2/4] Polishing README.md --- README.md | 90 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index c00f10bb..a5f530aa 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,17 @@ -# Nebula Exchange +# NebulaGraph Exchange [中文版](https://github.com/vesoft-inc/nebula-exchange/blob/master/README-CN.md) - -Nebula Exchange (Exchange for short) is an Apache Spark application. It is used to migrate cluster data in bulk from Spark to Nebula Graph in a distributed environment. It supports migration of batch data and streaming data in various formats. -Exchange only supports Nebula Graph 2.x and 3.x. +NebulaGraph Exchange (referred to as Exchange) is an Apache Spark™ application used to migrate data in bulk from different sources to NebulaGraph in a distributed way(Spark). It supports a variety of batch or streaming data sources and allows direct writing to NebulaGraph through side-loading (SST Files). -If you want to import data for Nebula Graph v1.x,please use [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange). +Exchange supports Spark versions 2.2, 2.4, and 3.0 along with their respective toolkits named: `nebula-exchange_spark_2.2`, `nebula-exchange_spark_2.4`, and `nebula-exchange_spark_3.0`. -Exchange currently supports spark2.2, spark2.4 and spark3.0, and the corresponding toolkits are nebula-exchange_spark_2.2, nebula-exchange_spark_2.4, nebula-exchange_spark_3.0. +> Note: +> - Exchange 3.4.0 does not support Apache Kafka and Apache Pulsar. Please use Exchange of version 3.0.0, 3.3.0, or 3.5.0 to load data from Apache Kafka or Apache Pulsar to NebulaGraph for now. +> - This repo covers only NebulaGraph 2.x and 3.x, for NebulaGraph v1.x, please use [NebulaGraph Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange). -> note: 3.4.0 version does not support kafka and pulsar. Please use 3.0.0 or 3.3.0 or 3.5.0 to import Kafka/pulsar into NebulaGraph. +## Build or Download Exchange -## How to get - -1. Package latest Exchange +1. Build the latest Exchange ```bash $ git clone https://github.com/vesoft-inc/nebula-exchange.git @@ -23,32 +21,39 @@ Exchange currently supports spark2.2, spark2.4 and spark3.0, and the correspondi $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0 ``` - After the packaging, you can see the newly generated nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.2/target/ directory, - nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.4/target/ directory, - nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_3.0/target/ directory. -2. Download from github artifact - - **release version:** - - https://github.com/vesoft-inc/nebula-exchange/releases - or https://nebula-graph.com.cn/release/?exchange - - **snapshot version:** + After packaging, the newly generated JAR files can be found in the following path: + - nebula-exchange/nebula-exchange_spark_2.2/target/ contains nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar + - nebula-exchange/nebula-exchange_spark_2.4/target/ contains nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar + - nebula-exchange/nebula-exchange_spark_3.0/target/ contains nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar + +3. Download from the GitHub artifact - https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml -## How to use + **Released Version:** -Import command: -``` + [GitHub Releases](https://github.com/vesoft-inc/nebula-exchange/releases) + or [Downloads](https://www.nebula-graph.io/release?exchange=) + + **Snapshot Version:** + + [GitHub Actions Artifacts](https://github.com/vesoft-inc/nebula-exchange/actions/workflows/snapshot.yml) + +## Get Started + +Here is an example command to run the Exchange: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf ``` -If your source is HIVE, import command is: -``` + +And when the source is **Hive**, run: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf -h ``` -Note:Submit Exchange with Yarn-Cluster mode, please use following command: -``` +Run the Exchange in **Yarn-Cluster** mode: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \ --master yarn-cluster \ --files application.conf \ @@ -58,7 +63,8 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -Note: When use Exchange to generate SST files, please add spark.sql.shuffle.partition config for Spark's shuffle operation: +Note: When using Exchange to generate SST files, please add `spark.sql.shuffle.partition` in `--conf` for Spark's shuffle operation: + ``` $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \ --master local \ @@ -67,14 +73,14 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -For more details about Exchange, please refer to [Exchange 2.0](https://docs.nebula-graph.io/2.6.2/16.eco-tools/1.nebula-exchange/) . +For more details, please refer to [NebulaGraph Exchange Docs](https://docs.nebula-graph.io/master/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) -## Version match +## Version Compatibility Matrix -There are the version correspondence between Nebula Exchange and Nebula: +Here is the version correspondence between Exchange and NebulaGraph: -| Nebula Exchange Version | Nebula Version | Spark Version | -|:-----------------------:|:--------------:|:--------------:| +| Exchange Version | Nebula Version | Spark Version | +|:----------------:|:--------------:|:--------------:| |nebula-exchange-2.0.0.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.0.1.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.1.0.jar| 2.0.0, 2.0.1 |2.4.*| @@ -95,13 +101,13 @@ There are the version correspondence between Nebula Exchange and Nebula: |nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar| nightly |2.4.*| |nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar| nightly |`3.0.*`,`3.1.*`,`3.2.*`,`3.3.*`| -## New Features +## Feature History -1. Supports importing vertex data with String and Integer type IDs. -2. Supports importing data of the Null, Date, DateTime, and Time types(DateTime uses UTC, not local time). -3. Supports importing data from other Hive sources besides Hive on Spark. -4. Supports recording and retrying the INSERT statement after failures during data import. -5. Supports SST import, but not support property's default value yet. -6. Supports Spark 2.2, Spark 2.4 and Spark 3.0. +1. Exchange allows for the import of vertex data with both String and Integer type IDs. +2. Exchange also supports importing data of various types, including Null, Date, DateTime (using UTC instead of local time), and Time. +3. In addition to Hive on Spark, Exchange can import data from other Hive sources as well. +4. If there are failures during the data import process, Exchange supports recording and retrying the INSERT statement. +5. While SST import is supported by Exchange, property default values are not yet supported. +6. Exchange is compatible with Spark 2.2, Spark 2.4, and Spark 3.0. Refer to [application.conf](https://github.com/vesoft-inc/nebula-exchange/blob/master/exchange-common/src/test/resources/application.conf) as an example to edit the configuration file. From e30b73ba33ebf46d7a4e82d579f297a3166102ec Mon Sep 17 00:00:00 2001 From: Wey Gu Date: Sun, 25 Jun 2023 13:00:42 +0800 Subject: [PATCH 3/4] Update README-CN.md --- README-CN.md | 51 ++++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/README-CN.md b/README-CN.md index 3742ff27..87cc754d 100644 --- a/README-CN.md +++ b/README-CN.md @@ -1,15 +1,14 @@ -# 欢迎使用 Nebula Exchange +# 欢迎使用 NebulaGraph Exchange [English](https://github.com/vesoft-inc/nebula-exchange/blob/master/README.md) -Nebula Exchange(简称为 Exchange)是一款 Apache Spark™ 应用,用于在分布式环境中将集群中的数据批量迁移到 Nebula Graph 中,能支持多种不同格式的批式数据和流式数据的迁移。 +NebulaGraph Exchange(以下简称 Exchange)是一款 Apache Spark™ 应用,用于在分布式环境中将集群中的数据批量迁移到 NebulaGraph 中,它能支持多种不同格式的批式数据和流式数据的迁移,它还支持直接与 SST File 方式的 NebulaGraph 写入。 -Exchange 仅支持 Nebula Graph 2.x 和 3.x。 -如果您正在使用 Nebula Graph v1.x,请使用 [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange) ,或参考 Exchange 1.0 的使用文档 [Nebula Exchange 用户手册](https://docs.nebula-graph.com.cn/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ "点击前往 Nebula Graph 网站")。 +Exchange 支持的 Spark 版本包括 2.2、2.4 和 3.0,对应的工具包名分别为 `nebula-exchange_spark_2.2`、`nebula-exchange_spark_2.4` 和 `nebula-exchange_spark_3.0`。 -Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包名分别是 nebula-exchange_spark_2.2,nebula-exchange_spark_2.4,nebula-exchange_spark_3.0。 - -> 注意:3.4.0版本不支持 kafka 和 pulsar, 若需将 kafka 或 pulsar 数据导入 NebulaGraph,请使用 3.0.0 或 3.3.0 或 3.5.0 版本。 +> 注意: +> - 3.4.0 版本不支持 kafka 和 pulsar, 若需将 kafka 或 pulsar 数据导入 NebulaGraph,请使用 3.0.0 或 3.3.0 或 3.5.0 版本。 +> - 本仓库仅支持 NebulaGraph 2.x 和 3.x,如果您在使用 NebulaGraph v1.x,请使用 [NebulaExchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange) ,或参考 Exchange 1.0 的使用文档[NebulaExchange 用户手册](https://docs.nebula-graph.com.cn/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ "点击前往 Nebula Graph 网站")。 ## 如何获取 @@ -23,27 +22,29 @@ Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包 $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0 ``` - 编译打包完成后,可以在 nebula-exchange/nebula-exchange_spark_2.2/target/ 目录下看到 nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar 文件, - 在 nebula-exchange/nebula-exchange_spark_2.4/target/ 目录下看到 nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar 文件, - 在 nebula-exchange/nebula-exchange_spark_3.0/target/ 目录下看到 nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar 文件。 -2. 在官网或 github 下载 - - 正式版本: - - https://github.com/vesoft-inc/nebula-exchange/releases - 或 - https://nebula-graph.com.cn/release/?exchange - - 快照版本: (进入页面点击任意workflow后,snapshot版本的jar包在Artifacts中,根据需求自行下载) - - https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml + 编译打包完成后,可以: + - 在 nebula-exchange/nebula-exchange_spark_2.2/target/ 目录下找到 nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar 文件; + - 在 nebula-exchange/nebula-exchange_spark_2.4/target/ 目录下找到 nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar 文件; + - 以及在 nebula-exchange/nebula-exchange_spark_3.0/target/ 目录下找到 nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar 文件。 + +3. 在官网或 GitHub 下载 + + **正式版本** + + [GitHub Releases](https://github.com/vesoft-inc/nebula-exchange/releases) + 或者 [Downloads](https://www.nebula-graph.com.cn/release?exchange=) + + **快照版本** + + 进入[GitHub Actions Artifacts](https://github.com/vesoft-inc/nebula-exchange/actions/workflows/snapshot.yml)页面点击任意 workflow 后,从 Artifacts 中,根据需求下载下载。 + ## 版本匹配 -Nebula Exchange 和 Nebula 的版本对应关系如下: +Exchange 和 NebulaGraph 的版本对应关系如下: -| Nebula Exchange Version | Nebula Version | Spark Version | -|:-----------------------:|:--------------:|:--------------:| +| Exchange Version | NebulaGraph Version | Spark Version | +|:----------------:|:-------------------:|:--------------:| |nebula-exchange-2.0.0.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.0.1.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.1.0.jar| 2.0.0, 2.0.1 |2.4.*| @@ -106,7 +107,7 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -关于 Nebula Exchange 的更多说明,请参考 Exchange 2.0 的 [使用手册](https://docs.nebula-graph.com.cn/2.6.2/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) 。 +关于 Nebula Exchange 的更多说明,请参考 Exchange 2.0 的[使用手册](https://docs.nebula-graph.com.cn/2.6.2/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) 。 ## 贡献 From b6f17d3361ea115024fd9de5733ba3042313fd7c Mon Sep 17 00:00:00 2001 From: Wey Gu Date: Sun, 25 Jun 2023 13:03:16 +0800 Subject: [PATCH 4/4] Update README.md --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a5f530aa..8a2d2470 100644 --- a/README.md +++ b/README.md @@ -103,11 +103,11 @@ Here is the version correspondence between Exchange and NebulaGraph: ## Feature History -1. Exchange allows for the import of vertex data with both String and Integer type IDs. -2. Exchange also supports importing data of various types, including Null, Date, DateTime (using UTC instead of local time), and Time. -3. In addition to Hive on Spark, Exchange can import data from other Hive sources as well. -4. If there are failures during the data import process, Exchange supports recording and retrying the INSERT statement. -5. While SST import is supported by Exchange, property default values are not yet supported. -6. Exchange is compatible with Spark 2.2, Spark 2.4, and Spark 3.0. +1. *Since 2.0* Exchange allows for the import of vertex data with both String and Integer type IDs. +2. *Since 2.0* Exchange also supports importing data of various types, including Null, Date, DateTime (using UTC instead of local time), and Time. +3. *Since 2.0* In addition to Hive on Spark, Exchange can import data from other Hive sources as well. +4. *Since 2.0* If there are failures during the data import process, Exchange supports recording and retrying the INSERT statement. +5. *Since 2.5* While SST import is supported by Exchange, property default values are not yet supported. +6. *Since 3.0* Exchange is compatible with Spark 2.2, Spark 2.4, and Spark 3.0. Refer to [application.conf](https://github.com/vesoft-inc/nebula-exchange/blob/master/exchange-common/src/test/resources/application.conf) as an example to edit the configuration file.