From b4fc6962097f42f8ab1a52941bde34a86c8fd46f Mon Sep 17 00:00:00 2001 From: cooper-lzy <78672629+cooper-lzy@users.noreply.github.com> Date: Mon, 18 Dec 2023 10:29:10 +0800 Subject: [PATCH] Spark related products add utf8 encoding hints (#2395) https://github.com/vesoft-inc/nebula-docs/issues/2391 --- docs-2.0-en/connector/nebula-spark-connector.md | 9 +++++++++ docs-2.0-en/graph-computing/nebula-algorithm.md | 9 +++++++++ docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md | 4 ++-- .../parameter-reference/ex-ug-para-import-command.md | 9 +++++++++ docs-2.0-zh/connector/nebula-spark-connector.md | 9 +++++++++ docs-2.0-zh/graph-computing/nebula-algorithm.md | 9 +++++++++ docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md | 4 ++-- .../parameter-reference/ex-ug-para-import-command.md | 9 +++++++++ 8 files changed, 58 insertions(+), 4 deletions(-) diff --git a/docs-2.0-en/connector/nebula-spark-connector.md b/docs-2.0-en/connector/nebula-spark-connector.md index d6ddd511d0c..e44fec90d29 100644 --- a/docs-2.0-en/connector/nebula-spark-connector.md +++ b/docs-2.0-en/connector/nebula-spark-connector.md @@ -125,6 +125,15 @@ dataframe.write.nebula().writeEdges() `nebula()` receives two configuration parameters, including connection configuration and read-write configuration. +!!! note + + If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + ### Reading data from NebulaGraph ```scala diff --git a/docs-2.0-en/graph-computing/nebula-algorithm.md b/docs-2.0-en/graph-computing/nebula-algorithm.md index a11e2bd6e26..60af50e27cc 100644 --- a/docs-2.0-en/graph-computing/nebula-algorithm.md +++ b/docs-2.0-en/graph-computing/nebula-algorithm.md @@ -105,6 +105,15 @@ After the compilation, a similar file `nebula-algorithm-3.x.x.jar` is generated ## How to use +!!! note + + If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + ### Use algorithm interface (recommended) The `lib` repository provides 10 common graph algorithms. diff --git a/docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md b/docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md index 7f020297610..7bceadce763 100644 --- a/docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md +++ b/docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md @@ -107,9 +107,9 @@ Check that the NebulaGraph service port is configured correctly. Check whether the version of Exchange is the same as that of NebulaGraph. For more information, see [Limitations](about-exchange/ex-ug-limitations.md). -### Q: How to correct the messy code when importing Hive data into NebulaGraph? +### Q: How to correct the encoding error when importing data in a Spark environment? -It may happen if the property value of the data in Hive contains Chinese characters. The solution is to add the following options before the JAR package path in the import command: +It may happen if the property value of the data contains Chinese characters. The solution is to add the following options before the JAR package path in the import command: ```bash --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 diff --git a/docs-2.0-en/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md b/docs-2.0-en/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md index dc6a63e893b..05bf926a315 100644 --- a/docs-2.0-en/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md +++ b/docs-2.0-en/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md @@ -8,6 +8,15 @@ After editing the configuration file, run the following commands to import speci /bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange -c ``` +!!! note + + If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + The following table lists command parameters. | Parameter | Required | Default value | Description | diff --git a/docs-2.0-zh/connector/nebula-spark-connector.md b/docs-2.0-zh/connector/nebula-spark-connector.md index dc4f377984b..33edc42f379 100644 --- a/docs-2.0-zh/connector/nebula-spark-connector.md +++ b/docs-2.0-zh/connector/nebula-spark-connector.md @@ -126,6 +126,15 @@ dataframe.write.nebula().writeEdges() `nebula()`接收两个配置参数,包括连接配置和读写配置。 +!!! note + + 如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + ### 从 {{nebula.name}} 读取数据 ```scala diff --git a/docs-2.0-zh/graph-computing/nebula-algorithm.md b/docs-2.0-zh/graph-computing/nebula-algorithm.md index b6614855e45..13b40156ecf 100644 --- a/docs-2.0-zh/graph-computing/nebula-algorithm.md +++ b/docs-2.0-zh/graph-computing/nebula-algorithm.md @@ -106,6 +106,15 @@ NebulaGraph Algorithm 实现图计算的流程如下: ## 使用方法 +!!! note + + 如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + ### 调用算法接口(推荐) `lib`库中提供了 10 种常用图计算算法,用户可以通过编程调用的形式调用算法。 diff --git a/docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md b/docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md index c91708d8f08..1e9818d522b 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md +++ b/docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md @@ -107,9 +107,9 @@ nebula-exchange-3.0.0.jar \ 检查 Exchange 版本与 {{nebula.name}} 版本是否匹配,详细信息可参考[使用限制](about-exchange/ex-ug-limitations.md)。 -### Q:将 Hive 中的数据导入 {{nebula.name}} 时出现乱码如何解决? +### Q:Spark 环境中导入数据时出现乱码如何解决? -如果 Hive 中数据的属性值包含中文字符,可能出现该情况。解决方案是在导入命令中的 JAR 包路径前加上以下选项: +如果数据的属性值包含中文字符,可能出现乱码。解决方案是在导入命令中的 JAR 包路径前加上以下选项: ```bash --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 diff --git a/docs-2.0-zh/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md b/docs-2.0-zh/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md index 1e969071337..2304e20c27f 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md +++ b/docs-2.0-zh/import-export/nebula-exchange/parameter-reference/ex-ug-para-import-command.md @@ -8,6 +8,15 @@ /bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange -c ``` +!!! note + + 如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项: + + ``` + --conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 + --conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8 + ``` + 参数说明如下。 | 参数 | 是否必需 | 默认值 | 说明 |