Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark related products add utf8 encoding hints #2395

Merged
merged 1 commit into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs-2.0-en/connector/nebula-spark-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,15 @@ dataframe.write.nebula().writeEdges()

`nebula()` receives two configuration parameters, including connection configuration and read-write configuration.

!!! note

If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

### Reading data from NebulaGraph

```scala
Expand Down
9 changes: 9 additions & 0 deletions docs-2.0-en/graph-computing/nebula-algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,15 @@ After the compilation, a similar file `nebula-algorithm-3.x.x.jar` is generated

## How to use

!!! note

If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

### Use algorithm interface (recommended)

The `lib` repository provides 10 common graph algorithms.
Expand Down
4 changes: 2 additions & 2 deletions docs-2.0-en/import-export/nebula-exchange/ex-ug-FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@ Check that the NebulaGraph service port is configured correctly.

Check whether the version of Exchange is the same as that of NebulaGraph. For more information, see [Limitations](about-exchange/ex-ug-limitations.md).

### Q: How to correct the messy code when importing Hive data into NebulaGraph?
### Q: How to correct the encoding error when importing data in a Spark environment?

It may happen if the property value of the data in Hive contains Chinese characters. The solution is to add the following options before the JAR package path in the import command:
It may happen if the property value of the data contains Chinese characters. The solution is to add the following options before the JAR package path in the import command:

```bash
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@ After editing the configuration file, run the following commands to import speci
<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path>
```

!!! note

If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

The following table lists command parameters.

| Parameter | Required | Default value | Description |
Expand Down
9 changes: 9 additions & 0 deletions docs-2.0-zh/connector/nebula-spark-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,15 @@ dataframe.write.nebula().writeEdges()

`nebula()`接收两个配置参数,包括连接配置和读写配置。

!!! note

如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

### 从 {{nebula.name}} 读取数据

```scala
Expand Down
9 changes: 9 additions & 0 deletions docs-2.0-zh/graph-computing/nebula-algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,15 @@ NebulaGraph Algorithm 实现图计算的流程如下:

## 使用方法

!!! note

如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

### 调用算法接口(推荐)

`lib`库中提供了 10 种常用图计算算法,用户可以通过编程调用的形式调用算法。
Expand Down
4 changes: 2 additions & 2 deletions docs-2.0-zh/import-export/nebula-exchange/ex-ug-FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@ nebula-exchange-3.0.0.jar \

检查 Exchange 版本与 {{nebula.name}} 版本是否匹配,详细信息可参考[使用限制](about-exchange/ex-ug-limitations.md)。

### Q:将 Hive 中的数据导入 {{nebula.name}} 时出现乱码如何解决
### Q:Spark 环境中导入数据时出现乱码如何解决

如果 Hive 中数据的属性值包含中文字符,可能出现该情况。解决方案是在导入命令中的 JAR 包路径前加上以下选项:
如果数据的属性值包含中文字符,可能出现乱码。解决方案是在导入命令中的 JAR 包路径前加上以下选项:

```bash
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@
<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path>
```

!!! note

如果数据的属性值包含中文字符,可能出现乱码。请在提交 Spark 任务时加上以下选项:

```
--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8
```

参数说明如下。

| 参数 | 是否必需 | 默认值 | 说明 |
Expand Down