From 472456a02e6b1857628e272cfbd15d6a77f8d542 Mon Sep 17 00:00:00 2001 From: cooper-lzy <78672629+cooper-lzy@users.noreply.github.com> Date: Tue, 27 Feb 2024 11:34:48 +0800 Subject: [PATCH] add kerberos in hdfs (#321) (#2484) * add kerberos in hdfs https://confluence.nebula-graph.io/pages/viewpage.action?pageId=97847199 * Update ex-ug-import-from-csv.md * Update ex-ug-import-from-csv.md * update * update * update * update --- .../use-exchange/ex-ug-import-from-csv.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-hive.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-json.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-orc.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-parquet.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-csv.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-hive.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-json.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-orc.md | 28 +++++++++++++++++++ .../use-exchange/ex-ug-import-from-parquet.md | 28 +++++++++++++++++++ 10 files changed, 280 insertions(+) diff --git a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md index d2df9a842b2..31d77137462 100644 --- a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md +++ b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md @@ -365,6 +365,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha You can search for `batchSuccess.` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`. +#### Access HDFS data with Kerberos certification + +When using Kerberos for security certification, you can access the HDFS data in one of the following ways. + +- Configure the Kerberos configuration file in a command + + Configure `--conf` and `--files` in the command, for example: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + The file path in `--conf` can be configured in two ways as follows: + + - Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path. + - (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR. + + The files in `--files` must be stored on the machine where the `spark-submit` command is executed. + +- Without commands + + Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=` to `spark-env.sh` in Spark. + ### Step 5: (optional) Validate data Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example: diff --git a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md index 3724828385b..c37c88c0977 100644 --- a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md +++ b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md @@ -398,6 +398,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha You can search for `batchSuccess.` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`. +#### Access HDFS data with Kerberos certification + +When using Kerberos for security certification, you can access the HDFS data in one of the following ways. + +- Configure the Kerberos configuration file in a command + + Configure `--conf` and `--files` in the command, for example: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + The file path in `--conf` can be configured in two ways as follows: + + - Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path. + - (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR. + + The files in `--files` must be stored on the machine where the `spark-submit` command is executed. + +- Without commands + + Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=` to `spark-env.sh` in Spark. + ### Step 5: (optional) Validate data Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example: diff --git a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md index baafe025ce0..ac889c17f97 100644 --- a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md +++ b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md @@ -377,6 +377,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha You can search for `batchSuccess.` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`. +#### Access HDFS data with Kerberos certification + +When using Kerberos for security certification, you can access the HDFS data in one of the following ways. + +- Configure the Kerberos configuration file in a command + + Configure `--conf` and `--files` in the command, for example: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + The file path in `--conf` can be configured in two ways as follows: + + - Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path. + - (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR. + + The files in `--files` must be stored on the machine where the `spark-submit` command is executed. + +- Without commands + + Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=` to `spark-env.sh` in Spark. + ### Step 5: (optional) Validate data Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example: diff --git a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md index 878f9edbe62..ead698e99f5 100644 --- a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md +++ b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha You can search for `batchSuccess.` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`. +#### Access HDFS data with Kerberos certification + +When using Kerberos for security certification, you can access the HDFS data in one of the following ways. + +- Configure the Kerberos configuration file in a command + + Configure `--conf` and `--files` in the command, for example: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + The file path in `--conf` can be configured in two ways as follows: + + - Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path. + - (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR. + + The files in `--files` must be stored on the machine where the `spark-submit` command is executed. + +- Without commands + + Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=` to `spark-env.sh` in Spark. + ### Step 5: (optional) Validate data Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example: diff --git a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md index 1fa5e400c43..efaea5cd7f3 100644 --- a/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md +++ b/docs-2.0-en/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md @@ -342,6 +342,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha You can search for `batchSuccess.` in the command output to check the number of successes. For example, `batchSuccess.follow: 300`. +#### Access HDFS data with Kerberos certification + +When using Kerberos for security certification, you can access the HDFS data in one of the following ways. + +- Configure the Kerberos configuration file in a command + + Configure `--conf` and `--files` in the command, for example: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + The file path in `--conf` can be configured in two ways as follows: + + - Configure the absolute path to the file. All YARN or Spark machines are required to have the corresponding file in the same path. + - (Recommended in YARN mode) Configure the relative path to the file (e.g. `./krb5.conf`). The resource files uploaded via `--files` are located in the working directory of the Java virtual machine or JAR. + + The files in `--files` must be stored on the machine where the `spark-submit` command is executed. + +- Without commands + + Deploy the Spark and Kerberos-certified Hadoop in a same cluster to make them share HDFS and YARN, and then add the configuration `export HADOOP_HOME=` to `spark-env.sh` in Spark. + ### Step 5: (optional) Validate data Users can verify that data has been imported by executing a query in the NebulaGraph client (for example, NebulaGraph Studio). For example: diff --git a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md index fe1660d7e91..5f05feb456c 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md +++ b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-csv.md @@ -362,6 +362,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha 用户可以在返回信息中搜索`batchSuccess.`,确认成功的数量。例如`batchSuccess.follow: 300`。 +#### 访问 Kerberos 认证的 HDFS + +使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。 + +- 在命令中设置 Kerberos 配置文件 + + 在命令中配置`--conf`和`--files`,例如: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + `--conf`中的文件路径有如下两种配置方式: + + - 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。 + - (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。 + + `--files`中的文件必须存储在执行`spark-submit`命令的机器上。 + +- 不使用命令 + + 将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=`。 + ### 步骤 5:(可选)验证数据 用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如: diff --git a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md index 413a345f2ed..afea2114dd6 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md +++ b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-hive.md @@ -393,6 +393,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha 用户可以在返回信息中搜索`batchSuccess.`,确认成功的数量。例如`batchSuccess.follow: 300`。 +#### 访问 Kerberos 认证的 HDFS + +使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。 + +- 在命令中设置 Kerberos 配置文件 + + 在命令中配置`--conf`和`--files`,例如: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + `--conf`中的文件路径有如下两种配置方式: + + - 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。 + - (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。 + + `--files`中的文件必须存储在执行`spark-submit`命令的机器上。 + +- 不使用命令 + + 将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=`。 + ### 步骤 5:(可选)验证数据 用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如: diff --git a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md index 54a9ae6702c..88dcf51717e 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md +++ b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-json.md @@ -368,6 +368,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha 用户可以在返回信息中搜索`batchSuccess.`,确认成功的数量。例如`batchSuccess.follow: 300`。 +#### 访问 Kerberos 认证的 HDFS + +使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。 + +- 在命令中设置 Kerberos 配置文件 + + 在命令中配置`--conf`和`--files`,例如: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + `--conf`中的文件路径有如下两种配置方式: + + - 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。 + - (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。 + + `--files`中的文件必须存储在执行`spark-submit`命令的机器上。 + +- 不使用命令 + + 将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=`。 + ### 步骤 5:(可选)验证数据 用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如: diff --git a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md index d09071db933..4d72efec53a 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md +++ b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-orc.md @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha 用户可以在返回信息中搜索`batchSuccess.`,确认成功的数量。例如`batchSuccess.follow: 300`。 +#### 访问 Kerberos 认证的 HDFS + +使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。 + +- 在命令中设置 Kerberos 配置文件 + + 在命令中配置`--conf`和`--files`,例如: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + `--conf`中的文件路径有如下两种配置方式: + + - 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。 + - (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。 + + `--files`中的文件必须存储在执行`spark-submit`命令的机器上。 + +- 不使用命令 + + 将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=`。 + ### 步骤 5:(可选)验证数据 用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如: diff --git a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md index dc4f5c9fa06..f6435a095fc 100644 --- a/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md +++ b/docs-2.0-zh/import-export/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md @@ -341,6 +341,34 @@ ${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.excha 用户可以在返回信息中搜索`batchSuccess.`,确认成功的数量。例如`batchSuccess.follow: 300`。 +#### 访问 Kerberos 认证的 HDFS + +使用 Kerberos 进行安全认证时,需使用以下两种方式之一访问 Kerberos 认证的 HDFS。 + +- 在命令中设置 Kerberos 配置文件 + + 在命令中配置`--conf`和`--files`,例如: + + ```bash + ${SPARK_HOME}/bin/spark-submit --master xxx --num-executors 2 --executor-cores 2 --executor-memory 1g \ + --conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --conf "spark.executor.extraJavaOptions=-Djava.security.krb5.conf=./krb5.conf" \ + --files /local/path/to/xxx.keytab,/local/path/to/krb5.conf \ + --class com.vesoft.nebula.exchange.Exchange \ + exchange.jar -c xx.conf + ``` + + `--conf`中的文件路径有如下两种配置方式: + + - 配置文件的绝对路径。要求所有 YARN 或者 Spark 机器相同路径下都有对应文件。 + - (YARN 模式下推荐)配置文件的相对路径(例如`./krb5.conf`)。通过`--files`上传的资源文件就在 Java 虚拟机或者 JAR 的工作目录下。 + + `--files`中的文件必须存储在执行`spark-submit`命令的机器上。 + +- 不使用命令 + + 将 Spark 和 Kerberos 认证的 Hadoop 部署在相同集群内,共用 HDFS 和 YARN,然后在 Spark 的`spark-env.sh`中增加配置`export HADOOP_HOME=`。 + ### 步骤 5:(可选)验证数据 用户可以在 {{nebula.name}} 客户端(例如 NebulaGraph Studio)中执行查询语句,确认数据是否已导入。例如: