From 5dcbedc01f4bc9c4bf45f1a7000e6544a009ce19 Mon Sep 17 00:00:00 2001 From: Le-Zheng Date: Tue, 6 Sep 2022 09:18:08 +0800 Subject: [PATCH 01/10] tdx e2e readme --- ppml/tdx/README.md | 109 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 ppml/tdx/README.md diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md new file mode 100644 index 00000000000..10b10ec54a5 --- /dev/null +++ b/ppml/tdx/README.md @@ -0,0 +1,109 @@ +# Privacy Preserving Machine Learning (PPML) TDX User Guide + +TDX-based Trusted Big Data ML allows the user to run end-to-end big data analytics application and BigDL model training with Spark local and distributed cluster on Intel Trust Domain Extensions (Intel TDX). + +- [Before running the code](#before-running-the-code) +- [Run as Spsrk Local Mode](#run-as-spsrk-local-mode) +- [Run as Spark on Kubernetes Mode](#run-as-spark-on-kubernetes-mode) + +## Before running the code +#### 1. Prepare the key +The ppml in bigdl needs secured keys to enable spark security such as Authentication, RPC Encryption, Local Storage Encryption and TLS, you need to prepare the secure keys and keystores. In this tutorial, you can generate keys and keystores with root permission (test only, need input security password for keys). + +```bash +git clone https://github.com/intel-analytics/BigDL.git +bash ppml/scripts/generate-keys.sh +``` +It will generate the keystores in `./keys`. +#### 2. Prepare the password +Next, you need to store the password you used for key generation in a secured file. + +```bash +bash ppml/scripts/generate-password.sh used_password_when_generate_keys +``` +It will generate that in `./password`. + +## Run as Spsrk Local Mode +### 1. Start the client container to run applications in spark local mode +```bash +```bash +export KEYS_PATH=YOUR_LOCAL_KEYS_PATH +export LOCAL_IP=YOUR_LOCAL_IP +export DOCKER_IMAGE=intelanalytics/bigdl-k8s:latest +sudo docker run -itd \ + --privileged \ + --net=host \ + -v $KEYS_PATH:/opt/spark/work-dir/keys \ + --name=spark-local-client \ + -e LOCAL_IP=$LOCAL_IP \ + $DOCKER_IMAGE bash +``` +Run `docker exec -it spark-local-client bash` to entry the container. +### 2. Run applications in spark local mode +The example for run Spark Pi: +```bash +bash spark-submit-with-ppml-tdx-local.sh \ + --master local[4] \ + --name spark-pi \ + --class org.apache.spark.examples.SparkPi \ + --conf spark.executor.instances=1 \ + local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar +``` +## Run as Spark on Kubernetes Mode +### 1. Start the client container to run applications in spark K8s mode +#### 1.1 Prepare the keys and password +Please refer to the previous section about [prepare keys](#Prepare the key) and [prepare password](#Prepare the password). +```bash +bash ../../../scripts/generate-keys.sh +bash ../../../scripts/generate-password.sh YOUR_PASSWORD +kubectl apply -f keys/keys.yaml +kubectl apply -f password/password.yaml +``` +#### 1.2 Prepare the k8s configurations +##### 1.2.1 Create the RBAC +```bash +kubectl create serviceaccount spark +kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default +``` +##### 1.2.2 Generate k8s config file +```bash +kubectl config view --flatten --minify > /YOUR_DIR/kubeconfig +``` +##### 1.2.3 Create k8s secret +```bash +kubectl create secret generic spark-secret --from-literal secret=YOUR_PASSWORD +``` +The secret created (YOUR_PASSWORD) should be the same as the password you specified in section 1.1 for generating the key. +#### 1.3 Start the client container +```bash +export K8S_MASTER=k8s://$(sudo kubectl cluster-info | grep 'https.*6443' -o -m 1) +export KEYS_PATH=YOUR_LOCAL_KEYS_PATH +export SECURE_PASSWORD_PATH=YOUR_LOCAL_PASSWORD_PATH +export KUBECONFIG_PATH=KUBECONFIG_PATH +export LOCAL_IP=YOUR_LOCAL_IP +export DOCKER_IMAGE=intelanalytics/bigdl-k8s:latest +sudo docker run -itd \ + --privileged \ + --net=host \ + -v $KUBECONFIG_PATH:/root/.kube/config \ + -v $KEYS_PATH:/opt/spark/work-dir/keys \ + -v $SECURE_PASSWORD_PATH:/opt/spark/work-dir/password \ + --name=spark-k8s-client \ + -e LOCAL_IP=$LOCAL_IP \ + -e RUNTIME_SPARK_MASTER=$K8S_MASTER \ + -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ + -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \ + $DOCKER_IMAGE bash +``` +Run `docker exec -it spark-local-client bash` to entry the container. +### 2. Run application in spark K8S mode +#### 2.1 Run application in K8S client mode +The example for run Spark Pi: +```bash +spark-submit-with-ppml-tdx-k8s.sh +``` +#### 2.2 Run application in K8s cluster mode +The example for run Spark Pi: +```bash +spark-submit-with-ppml-tdx-k8.sh +``` From 5450765236436b584654264f3ca570fd634619de Mon Sep 17 00:00:00 2001 From: Le-Zheng Date: Wed, 14 Sep 2022 10:50:54 +0800 Subject: [PATCH 02/10] update --- ppml/tdx/README.md | 62 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index 10b10ec54a5..fa71ea48f07 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -98,12 +98,66 @@ sudo docker run -itd \ Run `docker exec -it spark-local-client bash` to entry the container. ### 2. Run application in spark K8S mode #### 2.1 Run application in K8S client mode -The example for run Spark Pi: + ```bash -spark-submit-with-ppml-tdx-k8s.sh +export secure_password=.. && \ +bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ +--deploy-mode client \ +--name spark-tdx \ +--conf spark.driver.host=x.x.x.x \ +--conf spark.driver.port=54321 \ +--conf spark.driver.memory=8g \ +--conf spark.executor.cores=8 \ +--conf spark.executor.memory=8g \ +--conf spark.executor.instances=1 \ +--conf spark.cores.max=8 \ +--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ +--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ +--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ +--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ +--class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ +--jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +/bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--inputPath /bigdl2.0/data/people/encrypted \ +--outputPath /bigdl2.0/data/people/people_encrypted_output \ +--inputPartitionNum 8 \ +--outputPartitionNum 8 \ +--inputEncryptModeValue AES/CBC/PKCS5Padding \ +--outputEncryptModeValue AES/CBC/PKCS5Padding \ +--primaryKeyPath /bigdl2.0/data/20line_data_keys/primaryKey \ +--dataKeyPath /bigdl2.0/data/20line_data_keys/dataKey \ +--kmsType SimpleKeyManagementService \ +--simpleAPPID xx \ +--simpleAPPKEY xx ``` #### 2.2 Run application in K8s cluster mode -The example for run Spark Pi: + ```bash -spark-submit-with-ppml-tdx-k8.sh +export secure_password=.. && \ +bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ +--deploy-mode cluster \ +--name spark-tdx \ +--conf spark.driver.memory=8g \ +--conf spark.executor.cores=8 \ +--conf spark.executor.memory=8g \ +--conf spark.executor.instances=1 \ +--conf spark.cores.max=8 \ +--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ +--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ +--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ +--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ +--class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ +--jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +/bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--inputPath /bigdl2.0/data/people/encrypted \ +--outputPath /bigdl2.0/data/people/people_encrypted_output \ +--inputPartitionNum 8 \ +--outputPartitionNum 8 \ +--inputEncryptModeValue AES/CBC/PKCS5Padding \ +--outputEncryptModeValue AES/CBC/PKCS5Padding \ +--primaryKeyPath /bigdl2.0/data/20line_data_keys/primaryKey \ +--dataKeyPath /bigdl2.0/data/20line_data_keys/dataKey \ +--kmsType SimpleKeyManagementService \ +--simpleAPPID xx \ +--simpleAPPKEY xx ``` From a8abf40be943f37bb3294ab8d71174a7608437cd Mon Sep 17 00:00:00 2001 From: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Date: Wed, 14 Sep 2022 14:35:58 +0800 Subject: [PATCH 03/10] Update README.md --- ppml/tdx/README.md | 75 ++++++++++------------------------------------ 1 file changed, 15 insertions(+), 60 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index fa71ea48f07..ab929fea85c 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -1,14 +1,13 @@ # Privacy Preserving Machine Learning (PPML) TDX User Guide -TDX-based Trusted Big Data ML allows the user to run end-to-end big data analytics application and BigDL model training with Spark local and distributed cluster on Intel Trust Domain Extensions (Intel TDX). +TDX-based Trusted Big Data ML allows the user to run end-to-end big data analytics application and BigDL model training with distributed cluster on Intel Trust Domain Extensions (Intel TDX). - [Before running the code](#before-running-the-code) -- [Run as Spsrk Local Mode](#run-as-spsrk-local-mode) - [Run as Spark on Kubernetes Mode](#run-as-spark-on-kubernetes-mode) ## Before running the code #### 1. Prepare the key -The ppml in bigdl needs secured keys to enable spark security such as Authentication, RPC Encryption, Local Storage Encryption and TLS, you need to prepare the secure keys and keystores. In this tutorial, you can generate keys and keystores with root permission (test only, need input security password for keys). +BigDL PPML needs secured keys to enable spark security such as Authentication, RPC Encryption, Local Storage Encryption and TLS, you need to prepare the secure keys and keystores. In this tutorial, you can generate keys and keystores with root permission (test only, need input security password for keys). ```bash git clone https://github.com/intel-analytics/BigDL.git @@ -23,32 +22,6 @@ bash ppml/scripts/generate-password.sh used_password_when_generate_keys ``` It will generate that in `./password`. -## Run as Spsrk Local Mode -### 1. Start the client container to run applications in spark local mode -```bash -```bash -export KEYS_PATH=YOUR_LOCAL_KEYS_PATH -export LOCAL_IP=YOUR_LOCAL_IP -export DOCKER_IMAGE=intelanalytics/bigdl-k8s:latest -sudo docker run -itd \ - --privileged \ - --net=host \ - -v $KEYS_PATH:/opt/spark/work-dir/keys \ - --name=spark-local-client \ - -e LOCAL_IP=$LOCAL_IP \ - $DOCKER_IMAGE bash -``` -Run `docker exec -it spark-local-client bash` to entry the container. -### 2. Run applications in spark local mode -The example for run Spark Pi: -```bash -bash spark-submit-with-ppml-tdx-local.sh \ - --master local[4] \ - --name spark-pi \ - --class org.apache.spark.examples.SparkPi \ - --conf spark.executor.instances=1 \ - local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar -``` ## Run as Spark on Kubernetes Mode ### 1. Start the client container to run applications in spark K8s mode #### 1.1 Prepare the keys and password @@ -82,20 +55,10 @@ export SECURE_PASSWORD_PATH=YOUR_LOCAL_PASSWORD_PATH export KUBECONFIG_PATH=KUBECONFIG_PATH export LOCAL_IP=YOUR_LOCAL_IP export DOCKER_IMAGE=intelanalytics/bigdl-k8s:latest -sudo docker run -itd \ - --privileged \ - --net=host \ - -v $KUBECONFIG_PATH:/root/.kube/config \ - -v $KEYS_PATH:/opt/spark/work-dir/keys \ - -v $SECURE_PASSWORD_PATH:/opt/spark/work-dir/password \ - --name=spark-k8s-client \ - -e LOCAL_IP=$LOCAL_IP \ - -e RUNTIME_SPARK_MASTER=$K8S_MASTER \ - -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ - -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \ - $DOCKER_IMAGE bash + +kubectl apply -f tdx-client.yaml ``` -Run `docker exec -it spark-local-client bash` to entry the container. +Run `kubectl exec spark-local-client -- /bin/bash` to entry the client pod. ### 2. Run application in spark K8S mode #### 2.1 Run application in K8S client mode @@ -111,21 +74,17 @@ bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ --conf spark.executor.memory=8g \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ ---conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ ---conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ ---conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ ---conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ --jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ ---inputPath /bigdl2.0/data/people/encrypted \ ---outputPath /bigdl2.0/data/people/people_encrypted_output \ +--inputPath /people/encrypted \ +--outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ --outputPartitionNum 8 \ --inputEncryptModeValue AES/CBC/PKCS5Padding \ --outputEncryptModeValue AES/CBC/PKCS5Padding \ ---primaryKeyPath /bigdl2.0/data/20line_data_keys/primaryKey \ ---dataKeyPath /bigdl2.0/data/20line_data_keys/dataKey \ +--primaryKeyPath /keys/primaryKey \ +--dataKeyPath /keys/dataKey \ --kmsType SimpleKeyManagementService \ --simpleAPPID xx \ --simpleAPPKEY xx @@ -142,22 +101,18 @@ bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ --conf spark.executor.memory=8g \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ ---conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ ---conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ ---conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ ---conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ --jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ ---inputPath /bigdl2.0/data/people/encrypted \ ---outputPath /bigdl2.0/data/people/people_encrypted_output \ +--inputPath /people/encrypted \ +--outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ --outputPartitionNum 8 \ --inputEncryptModeValue AES/CBC/PKCS5Padding \ --outputEncryptModeValue AES/CBC/PKCS5Padding \ ---primaryKeyPath /bigdl2.0/data/20line_data_keys/primaryKey \ ---dataKeyPath /bigdl2.0/data/20line_data_keys/dataKey \ +--primaryKeyPath /keys/primaryKey \ +--dataKeyPath /keys/dataKey \ --kmsType SimpleKeyManagementService \ ---simpleAPPID xx \ ---simpleAPPKEY xx +--simpleAPPID $simpleAPPID \ +--simpleAPPKEY $simpleAPPKEY ``` From 44766e44f60c49493eca8df7845b39708ca72083 Mon Sep 17 00:00:00 2001 From: Le-Zheng Date: Wed, 14 Sep 2022 15:50:54 +0800 Subject: [PATCH 04/10] update --- ppml/tdx/README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index ab929fea85c..8c71cbc4da7 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -64,7 +64,8 @@ Run `kubectl exec spark-local-client -- /bin/bash` to entry the client pod. ```bash export secure_password=.. && \ -bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ +bash spark-submit-with-ppml-tdx-k8s.sh \ +--master k8s://https://x.x.x.x:6443 \ --deploy-mode client \ --name spark-tdx \ --conf spark.driver.host=x.x.x.x \ @@ -93,7 +94,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ ```bash export secure_password=.. && \ -bash spark-submit-with-ppml-tdx-k8s.sh --master k8s://https://x.x.x.x:6443 \ +bash spark-submit-with-ppml-tdx-k8s.sh \ +--master k8s://https://x.x.x.x:6443 \ --deploy-mode cluster \ --name spark-tdx \ --conf spark.driver.memory=8g \ From 62b862d2dcb3c2c58065ac1622bd1f30146c62f2 Mon Sep 17 00:00:00 2001 From: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Date: Thu, 15 Sep 2022 09:06:01 +0800 Subject: [PATCH 05/10] Update README.md --- ppml/tdx/README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index 8c71cbc4da7..f23b9ad7db1 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -2,8 +2,10 @@ TDX-based Trusted Big Data ML allows the user to run end-to-end big data analytics application and BigDL model training with distributed cluster on Intel Trust Domain Extensions (Intel TDX). -- [Before running the code](#before-running-the-code) -- [Run as Spark on Kubernetes Mode](#run-as-spark-on-kubernetes-mode) +### Overview Architecture +![image](https://user-images.githubusercontent.com/30695225/190288851-fd852a51-f193-444c-bdea-1edad8375dd1.png) +### BigDL PPML on TDX CC +![image](https://user-images.githubusercontent.com/30695225/190289025-dfcb3d01-9eed-4676-9df5-8412bd845894.png) ## Before running the code #### 1. Prepare the key @@ -51,10 +53,9 @@ The secret created (YOUR_PASSWORD) should be the same as the password you specif ```bash export K8S_MASTER=k8s://$(sudo kubectl cluster-info | grep 'https.*6443' -o -m 1) export KEYS_PATH=YOUR_LOCAL_KEYS_PATH -export SECURE_PASSWORD_PATH=YOUR_LOCAL_PASSWORD_PATH export KUBECONFIG_PATH=KUBECONFIG_PATH export LOCAL_IP=YOUR_LOCAL_IP -export DOCKER_IMAGE=intelanalytics/bigdl-k8s:latest +export DOCKER_IMAGE=intelanalytics/bigdl-tdx-client:latest kubectl apply -f tdx-client.yaml ``` @@ -87,8 +88,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --primaryKeyPath /keys/primaryKey \ --dataKeyPath /keys/dataKey \ --kmsType SimpleKeyManagementService \ ---simpleAPPID xx \ ---simpleAPPKEY xx +--simpleAPPID $simpleAPPID \ +--simpleAPPKEY $simpleAPPKEY ``` #### 2.2 Run application in K8s cluster mode From 2c106a98960befc5f7788bd83f845e433eca6bac Mon Sep 17 00:00:00 2001 From: Le-Zheng Date: Thu, 15 Sep 2022 19:50:10 +0800 Subject: [PATCH 06/10] update --- ppml/tdx/README.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index f23b9ad7db1..627b88ec409 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -59,9 +59,21 @@ export DOCKER_IMAGE=intelanalytics/bigdl-tdx-client:latest kubectl apply -f tdx-client.yaml ``` -Run `kubectl exec spark-local-client -- /bin/bash` to entry the client pod. -### 2. Run application in spark K8S mode -#### 2.1 Run application in K8S client mode +Run `kubectl exec -it spark-local-client -- /bin/bash` to entry the client pod. + +## 2. Run as Spsrk Local Mode +The example for run Spark Pi: +```bash +bash spark-submit-with-ppml-tdx-local.sh \ + --master local[4] \ + --name spark-pi \ + --class org.apache.spark.examples.SparkPi \ + --conf spark.executor.instances=1 \ + local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar +``` + +### 3. Run application in spark K8S mode +#### 3.1 Run application in K8S client mode ```bash export secure_password=.. && \ @@ -91,7 +103,7 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --simpleAPPID $simpleAPPID \ --simpleAPPKEY $simpleAPPKEY ``` -#### 2.2 Run application in K8s cluster mode +#### 3.2 Run application in K8s cluster mode ```bash export secure_password=.. && \ From 6f1680b800855bbd63438776e4106ef918d15b38 Mon Sep 17 00:00:00 2001 From: Le-Zheng Date: Thu, 15 Sep 2022 20:05:34 +0800 Subject: [PATCH 07/10] update --- ppml/tdx/README.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index 627b88ec409..856ad1cd6e1 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -7,6 +7,9 @@ TDX-based Trusted Big Data ML allows the user to run end-to-end big data analyti ### BigDL PPML on TDX CC ![image](https://user-images.githubusercontent.com/30695225/190289025-dfcb3d01-9eed-4676-9df5-8412bd845894.png) +## Prepare TDX CC Environment +Need to install TDX environment. + ## Before running the code #### 1. Prepare the key BigDL PPML needs secured keys to enable spark security such as Authentication, RPC Encryption, Local Storage Encryption and TLS, you need to prepare the secure keys and keystores. In this tutorial, you can generate keys and keystores with root permission (test only, need input security password for keys). @@ -57,9 +60,10 @@ export KUBECONFIG_PATH=KUBECONFIG_PATH export LOCAL_IP=YOUR_LOCAL_IP export DOCKER_IMAGE=intelanalytics/bigdl-tdx-client:latest +# modift tdx-client.yaml kubectl apply -f tdx-client.yaml ``` -Run `kubectl exec -it spark-local-client -- /bin/bash` to entry the client pod. +Run `kubectl exec -it YOUR_CLIENT_POD -- /bin/bash` to entry the client pod. ## 2. Run as Spsrk Local Mode The example for run Spark Pi: @@ -89,8 +93,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ ---jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ -/bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--jars /ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +/ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ --inputPath /people/encrypted \ --outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ @@ -117,8 +121,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ ---jars /bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ -/bigdl2.0/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--jars /ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +/ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ --inputPath /people/encrypted \ --outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ From fb13a995a604eb05b731fea8a7cbeabe74f08059 Mon Sep 17 00:00:00 2001 From: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Date: Fri, 16 Sep 2022 09:35:23 +0800 Subject: [PATCH 08/10] Update README.md --- ppml/tdx/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index 856ad1cd6e1..c7478bc4c8c 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -4,7 +4,7 @@ TDX-based Trusted Big Data ML allows the user to run end-to-end big data analyti ### Overview Architecture ![image](https://user-images.githubusercontent.com/30695225/190288851-fd852a51-f193-444c-bdea-1edad8375dd1.png) -### BigDL PPML on TDX CC +### BigDL PPML on TDX-CC ![image](https://user-images.githubusercontent.com/30695225/190289025-dfcb3d01-9eed-4676-9df5-8412bd845894.png) ## Prepare TDX CC Environment From 2d7223ccdf9b0e35b09b149be81a91cb2ee32abe Mon Sep 17 00:00:00 2001 From: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Date: Fri, 16 Sep 2022 10:22:39 +0800 Subject: [PATCH 09/10] Update README.md --- ppml/tdx/README.md | 48 ++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index c7478bc4c8c..bc3c6f0e5ad 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -27,6 +27,27 @@ bash ppml/scripts/generate-password.sh used_password_when_generate_keys ``` It will generate that in `./password`. +## Run as Spark Local Mode +Start the client pod +```bash +export KEYS_PATH=YOUR_LOCAL_KEYS_PATH +export DOCKER_IMAGE=intelanalytics/bigdl-tdx-client:latest + +# modift tdx-client.yaml +kubectl apply -f tdx-client.yaml +``` +Run `kubectl exec -it YOUR_CLIENT_POD -- /bin/bash` to entry the client pod. + +The example for run Spark Pi: +```bash +bash spark-submit-with-ppml-tdx-local.sh \ + --master local[4] \ + --name spark-pi \ + --class org.apache.spark.examples.SparkPi \ + --conf spark.executor.instances=1 \ + local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar +``` + ## Run as Spark on Kubernetes Mode ### 1. Start the client container to run applications in spark K8s mode #### 1.1 Prepare the keys and password @@ -65,20 +86,9 @@ kubectl apply -f tdx-client.yaml ``` Run `kubectl exec -it YOUR_CLIENT_POD -- /bin/bash` to entry the client pod. -## 2. Run as Spsrk Local Mode -The example for run Spark Pi: -```bash -bash spark-submit-with-ppml-tdx-local.sh \ - --master local[4] \ - --name spark-pi \ - --class org.apache.spark.examples.SparkPi \ - --conf spark.executor.instances=1 \ - local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar -``` - -### 3. Run application in spark K8S mode -#### 3.1 Run application in K8S client mode - +### 2. Run application in spark K8S mode +#### 2.1 Run application in K8S client mode +Sample submit command for Simple Query example. ```bash export secure_password=.. && \ bash spark-submit-with-ppml-tdx-k8s.sh \ @@ -93,8 +103,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ ---jars /ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ -/ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--jars ${BIGDL_HOME}/jars/bigdl-ppml-spark_3.1.2-*-jar-with-dependencies.jar \ +${BIGDL_HOME}/jars/bigdl-ppml-spark_3.1.2-*-jar-with-dependencies.jar \ --inputPath /people/encrypted \ --outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ @@ -107,7 +117,7 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --simpleAPPID $simpleAPPID \ --simpleAPPKEY $simpleAPPKEY ``` -#### 3.2 Run application in K8s cluster mode +#### 2.2 Run application in K8s cluster mode ```bash export secure_password=.. && \ @@ -121,8 +131,8 @@ bash spark-submit-with-ppml-tdx-k8s.sh \ --conf spark.executor.instances=1 \ --conf spark.cores.max=8 \ --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \ ---jars /ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ -/ppml/trusted-big-data-ml/work/data/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ +--jars ${BIGDL_HOME}/jars/bigdl-ppml-spark_3.1.2-*-jar-with-dependencies.jar \ +${BIGDL_HOME}/jars/bigdl-ppml-spark_3.1.2-*-jar-with-dependencies.jar \ --inputPath /people/encrypted \ --outputPath /people/people_encrypted_output \ --inputPartitionNum 8 \ From 105a32a746ea5ec50c7e8bef3b1ffc7da8136500 Mon Sep 17 00:00:00 2001 From: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Date: Fri, 16 Sep 2022 13:29:42 +0800 Subject: [PATCH 10/10] Add introduction for tdx environment --- ppml/tdx/README.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/ppml/tdx/README.md b/ppml/tdx/README.md index bc3c6f0e5ad..371754c5f40 100644 --- a/ppml/tdx/README.md +++ b/ppml/tdx/README.md @@ -8,7 +8,34 @@ TDX-based Trusted Big Data ML allows the user to run end-to-end big data analyti ![image](https://user-images.githubusercontent.com/30695225/190289025-dfcb3d01-9eed-4676-9df5-8412bd845894.png) ## Prepare TDX CC Environment -Need to install TDX environment. +[`Confidential Containers`](https://github.com/confidential-containers/documentation/blob/main/Overview.md) (CC) is an open source community working to enable cloud native confidential computing by leveraging [`Trusted Execution Environments`](https://en.wikipedia.org/wiki/Trusted_execution_environment) (TEE) to protect containers and data. + +The TEE seeks to protect the application and data from outside threats, with the application owner having complete control of all communication across the TEE boundary. The application is considered a single complete entity and once supplied with the resources it requires, the TEE protects those resources (memory and CPU) from the infrastructure and all communication across the TEE boundary is under the control of the application owner. + +Confidential Containers supports multiple TEE Technologies, such as Intel SGX and Intel TDX. [`Intel Trust Domain Extensions`](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html) (Intel TDX) is introducing new, architectural elements to help deploy hardware-isolated, virtual machines (VMs) called trust domains (TDs). Intel TDX is designed to isolate VMs from the virtual-machine manager (VMM)/hypervisor and any other non-TD software on the platform to protect TDs from a broad range of software. + +Combining the advantages of Intel TDX and Confidential Container, TDX-CC provides transparent deployment of unmodified containers and allows cloud-native application owners to enforce application security requirements. + +To deploy an actual workload with TDX-CC, you need to prepare the environment in two parts, including **hardware environment** and **Kata CCv0**. + +### Hardware Environment +1. Configure Hardware + CPU and firmware need to be upgraded to the latest release version. Some jumpers must be set to enable TDX work on Archer City or Vulcan City board. +2. Configure BIOS + TDX should be enabled in BIOS. This step is required to be performed every time BIOS is flashed. +3. Build and install packages + Packages of host kernel, guest kernel, qemu, libvirt should be built first. +4. Setup TDX Guest Image + A proper guest image utilizing the guest kernel, grub, and shim should be built. +5. Launch TD Guests + It is time to launch TD guests. Section Launch TD Guest leads you step by step to create and launch TD guests. +6. Verify statuses + The Verify TDX Status section provides guidance on how to verify whether TDX is initializing on both the host and guest. +7. Test TDX + TDX tests are used to validate basic functionality of TDX software stack. The tests focus on TDVM lifecycle management and environment validation. +### **Kata CCv0** +Refer to the [`ccv0.sh`](https://github.com/kata-containers/kata-containers/blob/CCv0/docs/how-to/ccv0.sh) to install kata ccv0. +To ensure the successful creation of Kata confidential containers, please follow the [`how-to-build-and-test-ccv0`](https://github.com/kata-containers/kata-containers/blob/CCv0/docs/how-to/how-to-build-and-test-ccv0.md#using-kubernetes-for-end-to-end-provisioning-of-a-kata-confidential-containers-pod-with-an-unencrypted-image) to verify. ## Before running the code #### 1. Prepare the key