From fe5be4e5967f0097bd81595899ffedd620f00170 Mon Sep 17 00:00:00 2001 From: Anqi Date: Sun, 25 Jun 2023 14:49:58 +0800 Subject: [PATCH] add note for version restriction (#144) * add note for version restriction * Polishing README.md * Update README-CN.md * Update README.md --------- Co-authored-by: Wey Gu --- README-CN.md | 49 +++++++++++++++-------------- README.md | 88 ++++++++++++++++++++++++++++------------------------ 2 files changed, 74 insertions(+), 63 deletions(-) diff --git a/README-CN.md b/README-CN.md index a4d8dcab..87cc754d 100644 --- a/README-CN.md +++ b/README-CN.md @@ -1,13 +1,14 @@ -# 欢迎使用 Nebula Exchange +# 欢迎使用 NebulaGraph Exchange [English](https://github.com/vesoft-inc/nebula-exchange/blob/master/README.md) -Nebula Exchange(简称为 Exchange)是一款 Apache Spark™ 应用,用于在分布式环境中将集群中的数据批量迁移到 Nebula Graph 中,能支持多种不同格式的批式数据和流式数据的迁移。 +NebulaGraph Exchange(以下简称 Exchange)是一款 Apache Spark™ 应用,用于在分布式环境中将集群中的数据批量迁移到 NebulaGraph 中,它能支持多种不同格式的批式数据和流式数据的迁移,它还支持直接与 SST File 方式的 NebulaGraph 写入。 -Exchange 仅支持 Nebula Graph 2.x 和 3.x。 -如果您正在使用 Nebula Graph v1.x,请使用 [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange) ,或参考 Exchange 1.0 的使用文档 [Nebula Exchange 用户手册](https://docs.nebula-graph.com.cn/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ "点击前往 Nebula Graph 网站")。 +Exchange 支持的 Spark 版本包括 2.2、2.4 和 3.0,对应的工具包名分别为 `nebula-exchange_spark_2.2`、`nebula-exchange_spark_2.4` 和 `nebula-exchange_spark_3.0`。 -Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包名分别是 nebula-exchange_spark_2.2,nebula-exchange_spark_2.4,nebula-exchange_spark_3.0。 +> 注意: +> - 3.4.0 版本不支持 kafka 和 pulsar, 若需将 kafka 或 pulsar 数据导入 NebulaGraph,请使用 3.0.0 或 3.3.0 或 3.5.0 版本。 +> - 本仓库仅支持 NebulaGraph 2.x 和 3.x,如果您在使用 NebulaGraph v1.x,请使用 [NebulaExchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange) ,或参考 Exchange 1.0 的使用文档[NebulaExchange 用户手册](https://docs.nebula-graph.com.cn/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ "点击前往 Nebula Graph 网站")。 ## 如何获取 @@ -21,27 +22,29 @@ Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包 $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0 ``` - 编译打包完成后,可以在 nebula-exchange/nebula-exchange_spark_2.2/target/ 目录下看到 nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar 文件, - 在 nebula-exchange/nebula-exchange_spark_2.4/target/ 目录下看到 nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar 文件, - 在 nebula-exchange/nebula-exchange_spark_3.0/target/ 目录下看到 nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar 文件。 -2. 在官网或 github 下载 - - 正式版本: - - https://github.com/vesoft-inc/nebula-exchange/releases - 或 - https://nebula-graph.com.cn/release/?exchange - - 快照版本: (进入页面点击任意workflow后,snapshot版本的jar包在Artifacts中,根据需求自行下载) - - https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml + 编译打包完成后,可以: + - 在 nebula-exchange/nebula-exchange_spark_2.2/target/ 目录下找到 nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar 文件; + - 在 nebula-exchange/nebula-exchange_spark_2.4/target/ 目录下找到 nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar 文件; + - 以及在 nebula-exchange/nebula-exchange_spark_3.0/target/ 目录下找到 nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar 文件。 + +3. 在官网或 GitHub 下载 + + **正式版本** + + [GitHub Releases](https://github.com/vesoft-inc/nebula-exchange/releases) + 或者 [Downloads](https://www.nebula-graph.com.cn/release?exchange=) + + **快照版本** + + 进入[GitHub Actions Artifacts](https://github.com/vesoft-inc/nebula-exchange/actions/workflows/snapshot.yml)页面点击任意 workflow 后,从 Artifacts 中,根据需求下载下载。 + ## 版本匹配 -Nebula Exchange 和 Nebula 的版本对应关系如下: +Exchange 和 NebulaGraph 的版本对应关系如下: -| Nebula Exchange Version | Nebula Version | Spark Version | -|:-----------------------:|:--------------:|:--------------:| +| Exchange Version | NebulaGraph Version | Spark Version | +|:----------------:|:-------------------:|:--------------:| |nebula-exchange-2.0.0.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.0.1.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.1.0.jar| 2.0.0, 2.0.1 |2.4.*| @@ -104,7 +107,7 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -关于 Nebula Exchange 的更多说明,请参考 Exchange 2.0 的 [使用手册](https://docs.nebula-graph.com.cn/2.6.2/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) 。 +关于 Nebula Exchange 的更多说明,请参考 Exchange 2.0 的[使用手册](https://docs.nebula-graph.com.cn/2.6.2/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) 。 ## 贡献 diff --git a/README.md b/README.md index 2d1bc25d..8a2d2470 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,17 @@ -# Nebula Exchange +# NebulaGraph Exchange [中文版](https://github.com/vesoft-inc/nebula-exchange/blob/master/README-CN.md) - -Nebula Exchange (Exchange for short) is an Apache Spark application. It is used to migrate cluster data in bulk from Spark to Nebula Graph in a distributed environment. It supports migration of batch data and streaming data in various formats. -Exchange only supports Nebula Graph 2.x and 3.x. +NebulaGraph Exchange (referred to as Exchange) is an Apache Spark™ application used to migrate data in bulk from different sources to NebulaGraph in a distributed way(Spark). It supports a variety of batch or streaming data sources and allows direct writing to NebulaGraph through side-loading (SST Files). -If you want to import data for Nebula Graph v1.x,please use [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange). +Exchange supports Spark versions 2.2, 2.4, and 3.0 along with their respective toolkits named: `nebula-exchange_spark_2.2`, `nebula-exchange_spark_2.4`, and `nebula-exchange_spark_3.0`. -Exchange currently supports spark2.2, spark2.4 and spark3.0, and the corresponding toolkits are nebula-exchange_spark_2.2, nebula-exchange_spark_2.4, nebula-exchange_spark_3.0. +> Note: +> - Exchange 3.4.0 does not support Apache Kafka and Apache Pulsar. Please use Exchange of version 3.0.0, 3.3.0, or 3.5.0 to load data from Apache Kafka or Apache Pulsar to NebulaGraph for now. +> - This repo covers only NebulaGraph 2.x and 3.x, for NebulaGraph v1.x, please use [NebulaGraph Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange). -## How to get +## Build or Download Exchange -1. Package latest Exchange +1. Build the latest Exchange ```bash $ git clone https://github.com/vesoft-inc/nebula-exchange.git @@ -21,32 +21,39 @@ Exchange currently supports spark2.2, spark2.4 and spark3.0, and the correspondi $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0 ``` - After the packaging, you can see the newly generated nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.2/target/ directory, - nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.4/target/ directory, - nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_3.0/target/ directory. -2. Download from github artifact - - **release version:** - - https://github.com/vesoft-inc/nebula-exchange/releases - or https://nebula-graph.com.cn/release/?exchange - - **snapshot version:** + After packaging, the newly generated JAR files can be found in the following path: + - nebula-exchange/nebula-exchange_spark_2.2/target/ contains nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar + - nebula-exchange/nebula-exchange_spark_2.4/target/ contains nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar + - nebula-exchange/nebula-exchange_spark_3.0/target/ contains nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar + +3. Download from the GitHub artifact - https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml -## How to use + **Released Version:** -Import command: -``` + [GitHub Releases](https://github.com/vesoft-inc/nebula-exchange/releases) + or [Downloads](https://www.nebula-graph.io/release?exchange=) + + **Snapshot Version:** + + [GitHub Actions Artifacts](https://github.com/vesoft-inc/nebula-exchange/actions/workflows/snapshot.yml) + +## Get Started + +Here is an example command to run the Exchange: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf ``` -If your source is HIVE, import command is: -``` + +And when the source is **Hive**, run: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf -h ``` -Note:Submit Exchange with Yarn-Cluster mode, please use following command: -``` +Run the Exchange in **Yarn-Cluster** mode: + +```bash $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \ --master yarn-cluster \ --files application.conf \ @@ -56,7 +63,8 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -Note: When use Exchange to generate SST files, please add spark.sql.shuffle.partition config for Spark's shuffle operation: +Note: When using Exchange to generate SST files, please add `spark.sql.shuffle.partition` in `--conf` for Spark's shuffle operation: + ``` $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \ --master local \ @@ -65,14 +73,14 @@ nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \ -c application.conf ``` -For more details about Exchange, please refer to [Exchange 2.0](https://docs.nebula-graph.io/2.6.2/16.eco-tools/1.nebula-exchange/) . +For more details, please refer to [NebulaGraph Exchange Docs](https://docs.nebula-graph.io/master/nebula-exchange/about-exchange/ex-ug-what-is-exchange/) -## Version match +## Version Compatibility Matrix -There are the version correspondence between Nebula Exchange and Nebula: +Here is the version correspondence between Exchange and NebulaGraph: -| Nebula Exchange Version | Nebula Version | Spark Version | -|:-----------------------:|:--------------:|:--------------:| +| Exchange Version | Nebula Version | Spark Version | +|:----------------:|:--------------:|:--------------:| |nebula-exchange-2.0.0.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.0.1.jar| 2.0.0, 2.0.1 |2.4.*| |nebula-exchange-2.1.0.jar| 2.0.0, 2.0.1 |2.4.*| @@ -93,13 +101,13 @@ There are the version correspondence between Nebula Exchange and Nebula: |nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar| nightly |2.4.*| |nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar| nightly |`3.0.*`,`3.1.*`,`3.2.*`,`3.3.*`| -## New Features +## Feature History -1. Supports importing vertex data with String and Integer type IDs. -2. Supports importing data of the Null, Date, DateTime, and Time types(DateTime uses UTC, not local time). -3. Supports importing data from other Hive sources besides Hive on Spark. -4. Supports recording and retrying the INSERT statement after failures during data import. -5. Supports SST import, but not support property's default value yet. -6. Supports Spark 2.2, Spark 2.4 and Spark 3.0. +1. *Since 2.0* Exchange allows for the import of vertex data with both String and Integer type IDs. +2. *Since 2.0* Exchange also supports importing data of various types, including Null, Date, DateTime (using UTC instead of local time), and Time. +3. *Since 2.0* In addition to Hive on Spark, Exchange can import data from other Hive sources as well. +4. *Since 2.0* If there are failures during the data import process, Exchange supports recording and retrying the INSERT statement. +5. *Since 2.5* While SST import is supported by Exchange, property default values are not yet supported. +6. *Since 3.0* Exchange is compatible with Spark 2.2, Spark 2.4, and Spark 3.0. Refer to [application.conf](https://github.com/vesoft-inc/nebula-exchange/blob/master/exchange-common/src/test/resources/application.conf) as an example to edit the configuration file.