Skip to content

Commit

Permalink
add kerberos in hdfs (#321) (#2484)
Browse files Browse the repository at this point in the history
* add kerberos in hdfs

https://confluence.nebula-graph.io/pages/viewpage.action?pageId=97847199

* Update ex-ug-import-from-csv.md

* Update ex-ug-import-from-csv.md

* update

* update

* update

* update
  • Loading branch information
cooper-lzy authored Feb 27, 2024
1 parent cd882ae commit 472456a
Show file tree
Hide file tree
Showing 10 changed files with 280 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

You can search for `batchSuccess.<tag_name/edge_name>` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`.

#### Access HDFS data with Kerberos certification

When using Kerberos for security certification, you can access the HDFS data in one of the following ways.

- Configure the Kerberos configuration file in a command

Configure `--conf` and `--files` in the command, for example:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

The file path in `--conf` can be configured in two ways as follows:

- Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path.
- (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR.

The files in `--files` must be stored on the machine where the `spark-submit` command is executed.

- Without commands

Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=<hadoop_home_path>` to `spark-env.sh` in Spark.

### Step 5: (optional) Validate data

Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

You can search for `batchSuccess.<tag_name/edge_name>` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`.

#### Access HDFS data with Kerberos certification

When using Kerberos for security certification, you can access the HDFS data in one of the following ways.

- Configure the Kerberos configuration file in a command

Configure `--conf` and `--files` in the command, for example:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

The file path in `--conf` can be configured in two ways as follows:

- Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path.
- (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR.

The files in `--files` must be stored on the machine where the `spark-submit` command is executed.

- Without commands

Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=<hadoop_home_path>` to `spark-env.sh` in Spark.

### Step 5: (optional) Validate data

Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

You can search for `batchSuccess.<tag_name/edge_name>` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`.

#### Access HDFS data with Kerberos certification

When using Kerberos for security certification, you can access the HDFS data in one of the following ways.

- Configure the Kerberos configuration file in a command

Configure `--conf` and `--files` in the command, for example:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

The file path in `--conf` can be configured in two ways as follows:

- Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path.
- (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR.

The files in `--files` must be stored on the machine where the `spark-submit` command is executed.

- Without commands

Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=<hadoop_home_path>` to `spark-env.sh` in Spark.

### Step 5: (optional) Validate data

Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

You can search for `batchSuccess.<tag_name/edge_name>` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`.

#### Access HDFS data with Kerberos certification

When using Kerberos for security certification, you can access the HDFS data in one of the following ways.

- Configure the Kerberos configuration file in a command

Configure `--conf` and `--files` in the command, for example:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

The file path in `--conf` can be configured in two ways as follows:

- Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path.
- (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR.

The files in `--files` must be stored on the machine where the `spark-submit` command is executed.

- Without commands

Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=<hadoop_home_path>` to `spark-env.sh` in Spark.

### Step 5: (optional) Validate data

Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

You can search for `batchSuccess.<tag_name/edge_name>` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`.

#### Access HDFS data with Kerberos certification

When using Kerberos for security certification, you can access the HDFS data in one of the following ways.

- Configure the Kerberos configuration file in a command

Configure `--conf` and `--files` in the command, for example:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

The file path in `--conf` can be configured in two ways as follows:

- Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path.
- (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR.

The files in `--files` must be stored on the machine where the `spark-submit` command is executed.

- Without commands

Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=<hadoop_home_path>` to `spark-env.sh` in Spark.

### Step 5: (optional) Validate data

Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

用户可以在返回信息中搜索`batchSuccess.<tag_name/edge_name>`,确认成功的数量。例如`batchSuccess.follow: 300`

#### 访问 Kerberos 认证的 HDFS

使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。

- 在命令中设置 Kerberos 配置文件

在命令中配置`--conf``--files`,例如:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

`--conf`中的文件路径有如下两种配置方式:

- 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。
- (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。

`--files`中的文件必须存储在执行`spark-submit`命令的机器上。

- 不使用命令

将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=<hadoop_home_path>`

### 步骤 5:(可选)验证数据

用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

用户可以在返回信息中搜索`batchSuccess.<tag_name/edge_name>`,确认成功的数量。例如`batchSuccess.follow: 300`

#### 访问 Kerberos 认证的 HDFS

使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。

- 在命令中设置 Kerberos 配置文件

在命令中配置`--conf``--files`,例如:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

`--conf`中的文件路径有如下两种配置方式:

- 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。
- (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。

`--files`中的文件必须存储在执行`spark-submit`命令的机器上。

- 不使用命令

将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=<hadoop_home_path>`

### 步骤 5:(可选)验证数据

用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

用户可以在返回信息中搜索`batchSuccess.<tag_name/edge_name>`,确认成功的数量。例如`batchSuccess.follow: 300`

#### 访问 Kerberos 认证的 HDFS

使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。

- 在命令中设置 Kerberos 配置文件

在命令中配置`--conf``--files`,例如:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

`--conf`中的文件路径有如下两种配置方式:

- 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。
- (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。

`--files`中的文件必须存储在执行`spark-submit`命令的机器上。

- 不使用命令

将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=<hadoop_home_path>`

### 步骤 5:(可选)验证数据

用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

用户可以在返回信息中搜索`batchSuccess.<tag_name/edge_name>`,确认成功的数量。例如`batchSuccess.follow: 300`

#### 访问 Kerberos 认证的 HDFS

使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。

- 在命令中设置 Kerberos 配置文件

在命令中配置`--conf``--files`,例如:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

`--conf`中的文件路径有如下两种配置方式:

- 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。
- (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。

`--files`中的文件必须存储在执行`spark-submit`命令的机器上。

- 不使用命令

将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=<hadoop_home_path>`

### 步骤 5:(可选)验证数据

用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha

用户可以在返回信息中搜索`batchSuccess.<tag_name/edge_name>`,确认成功的数量。例如`batchSuccess.follow: 300`

#### 访问 Kerberos 认证的 HDFS

使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。

- 在命令中设置 Kerberos 配置文件

在命令中配置`--conf``--files`,例如:

```bash
${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \
--files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \
--class com.vesoft.nebula.exchange.Exchange \
exchange.jar -c xx.conf
```

`--conf`中的文件路径有如下两种配置方式:

- 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。
- (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。

`--files`中的文件必须存储在执行`spark-submit`命令的机器上。

- 不使用命令

将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=<hadoop_home_path>`

### 步骤 5:(可选)验证数据

用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如:
Expand Down

0 comments on commit 472456a

Please sign in to comment.