exchange import mode (#2349)

Co-authored-by: Chris Chen <[email protected]>
vesoft-inc · Nov 14, 2023 · a5d7543 · a5d7543
1 parent f4a7bc3
commit a5d7543
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 44 deletions.
diff --git a/.../import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md b/.../import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md
@@ -2,28 +2,33 @@
 
 After editing the configuration file, run the following commands to import specified source data into the NebulaGraph database.
 
-- First import
+## Import data
 
-  ```bash
-  <spark_install_path>/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> 
-  ```
+```bash
+<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> 
+```
 
-- Import the reload file
+The following table lists command parameters.
 
-  If some data fails to be imported during the first import, the failed data will be stored in the reload file. Use the parameter `-r` to import the reload file.
+| Parameter | Required | Default value | Description |
+| :--- | :--- | :--- | :--- |
+| `--class`  | Yes | - | Specify the main class of the driver.|
+| `--master`  | Yes | - | Specify the URL of the master process in a Spark cluster. For more information, see [master-urls](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls). Optional values are:</br>`local`: Local Mode. Run Spark applications on a single thread. Suitable for importing small data sets in a test environment.</br>`yarn`: Run Spark applications on a YARN cluster. Suitable for importing large data sets in a production environment.</br>`spark://HOST:PORT`: Connect to the specified Spark standalone cluster.</br>`mesos://HOST:PORT`: Connect to the specified Mesos cluster.</br>`k8s://HOST:PORT`: Connect to the specified Kubernetes cluster.</br>|
+| `-c`/`--config`  | Yes | - | Specify the path of the configuration file. |
+| `-h`/`--hive`  | No | `false` | Specify whether importing Hive data is supported. |
+| `-D`/`--dry`  | No | `false` | Specify whether to check the format of the configuration file. This parameter is used to check the format of the configuration file only, it does not check the validity of `tags` and `edges` configurations and does not import data. Don't add this parameter if you need to import data. |
+| `-r`/`--reload` | No  |  -  |  Specify the path of the reload file that needs to be reloaded. |
 
-  ```bash
-  <spark_install_path>/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> -r "<reload_file_path>" 
-  ```
+For more Spark parameter configurations, see [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment).
 
 !!! note
 
     - The version number of a JAR file is subject to the name of the JAR file that is actually compiled.
 
-    - If users use the [yarn-cluster mode](https://spark-reference-doc-cn.readthedocs.io/zh_CN/latest/deploy-guide/running-on-yarn.html) to submit a job, see the following command, **especially the two '--conf' commands in the example**.
+    - If users use the [yarn mode](https://spark-reference-doc-cn.readthedocs.io/zh_CN/latest/deploy-guide/running-on-yarn.html) to submit a job, see the following command, **especially the two '--conf' commands in the example**.
 
     ```bash
-    $SPARK_HOME/bin/spark-submit     --master yarn-cluster \
+    $SPARK_HOME/bin/spark-submit     --master yarn \
     --class com.vesoft.nebula.exchange.Exchange \
     --files application.conf \
     --conf spark.driver.extraClassPath=./ \
@@ -32,15 +37,12 @@ After editing the configuration file, run the following commands to import speci
     -c application.conf
     ```
 
-The following table lists command parameters.
+## Import the reload file
 
-| Parameter | Required | Default value | Description |
-| :--- | :--- | :--- | :--- |
-| `--class`  | Yes | - | Specify the main class of the driver.|
-| `--master`  | Yes | - | Specify the URL of the master process in a Spark cluster. For more information, see [master-urls](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls "click to open Apache Spark documents"). |
-| `-c`  / `--config`  | Yes | - | Specify the path of the configuration file. |
-| `-h`  / `--hive`  | No | `false` | Indicate support for importing Hive data. |
-| `-D`  / `--dry`  | No | `false` | Check whether the format of the configuration file meets the requirements, but it does not check whether the configuration items of `tags` and `edges` are correct. This parameter cannot be added when users import data. |
-| `-r` / `--reload` | No  |  -  |  Specify the path of the reload file that needs to be reloaded. |
+If some data fails to be imported during the import, the failed data will be stored in the reload file. Use the parameter `-r` to import the data in reload file.
 
-For more Spark parameter configurations, see [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment).
+```bash
+<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> -r "<reload_file_path>" 
+```
+
+If the import still fails, go to [Official Forum](https://github.com/vesoft-inc/nebula/discussions) for consultation.
diff --git a/.../import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md b/.../import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md
@@ -2,28 +2,33 @@
 
 完成配置文件修改后，可以运行以下命令将指定来源的数据导入{{nebula.name}}数据库。
 
-- 首次导入
+## 导入数据
 
-  ```bash
-  <spark_install_path>/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> 
-  ```
+```bash
+<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> 
+```
 
-- 导入 reload 文件
-
-  如果首次导入时有一些数据导入失败，会将导入失败的数据存入 reload 文件，可以用参数`-r`尝试导入 reload 文件。
-
-  ```bash
-  <spark_install_path>/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> -r "<reload_file_path>" 
-  ```
+参数说明如下。
+
+| 参数 | 是否必需 | 默认值 | 说明 |
+| :--- | :--- | :--- | :--- |
+| `--class`  | 是 | 无 | 指定驱动的主类。 |
+| `--master`  | 是 | 无 | 指定 Spark 集群的 master URL。详情请参见 [master-urls](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls)。可选值为：</br>`local`：本地模式，使用单个线程运行 Spark 应用程序。适合在测试环境进行小数据量导入。</br>`yarn`：在 YARN 集群上运行 Spark 应用程序。适合在线上环境进行大数据量导入。</br>`spark://HOST:PORT`：连接到指定的 Spark standalone 集群。</br>`mesos://HOST:PORT`：连接到指定的 Mesos 集群。</br>`k8s://HOST:PORT`：连接到指定的 Kubernetes 集群。</br> |
+| `-c`/`--config`  | 是 | 无 | 指定配置文件的路径。 |
+| `-h`/`--hive`  | 否 | `false` | 添加这个参数表示支持从 Hive 中导入数据。 |
+| `-D`/`--dry`  | 否 | `false` | 指定是否检查配置文件的格式。该参数仅用于检查配置文件的格式，不检查`tags`和`edges`配置项的有效性，也不会导入数据。需要导入数据时不要添加这个参数。 |
+|`-r`/`--reload` | 否  |  无  |   指定需要重新加载的 reload 文件路径。 |
+
+更多 Spark 的参数配置说明请参见 [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment)。
 
 !!! note
 
     - JAR 文件版本号以实际编译得到的 JAR 文件名称为准。
 
-    - 如果使用 [yarn-cluster 模式](https://spark-reference-doc-cn.readthedocs.io/zh_CN/latest/deploy-guide/running-on-yarn.html)提交任务，请参考如下示例，**尤其是示例中的两个**`--conf`。
+    - 如果使用 [yarn 模式](https://spark-reference-doc-cn.readthedocs.io/zh_CN/latest/deploy-guide/running-on-yarn.html)提交任务，请参考如下示例，**尤其是示例中的两个**`--conf`。
 
     ```bash
-    $SPARK_HOME/bin/spark-submit     --master yarn-cluster \
+    $SPARK_HOME/bin/spark-submit     --master yarn \
     --class com.vesoft.nebula.exchange.Exchange \
     --files application.conf \
     --conf spark.driver.extraClassPath=./ \
@@ -32,15 +37,12 @@
     -c application.conf
     ```
 
-下表列出了命令的相关参数。
+## 导入 reload 文件
+
+如果导入数据时有一些数据导入失败，会将导入失败的数据存入 reload 文件，可以用参数`-r`尝试导入 reload 文件中的数据。
 
-| 参数 | 是否必需 | 默认值 | 说明 |
-| :--- | :--- | :--- | :--- |
-| `--class`  | 是 | 无 | 指定驱动的主类。 |
-| `--master`  | 是 | 无 | 指定 Spark 集群中 master 进程的 URL。详情请参见 [master-urls](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls "点击前往 Apache Spark 文档")。 |
-| `-c`  / `--config`  | 是 | 无 | 指定配置文件的路径。 |
-| `-h`  / `--hive`  | 否 | `false` | 添加这个参数表示支持从 Hive 中导入数据。 |
-| `-D`  / `--dry`  | 否 | `false` | 添加这个参数表示检查配置文件的格式是否符合要求，但不会校验`tags`和`edges`的配置项是否正确。正式导入数据时不能添加这个参数。 |
-|-r / --reload | 否  |  无  |   指定需要重新加载的 reload 文件路径。 |
+```bash
+<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> -r "<reload_file_path>" 
+```
 
-更多 Spark 的参数配置说明请参见 [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment)。
+如果仍然导入失败，请到[论坛](https://discuss.nebula-graph.com.cn/)寻求帮助。