From 0ffb0176f542fa61634156dbb3d0a9cda5f050e4 Mon Sep 17 00:00:00 2001 From: Edwin Yu Date: Tue, 10 Sep 2024 01:42:16 +0000 Subject: [PATCH 1/4] Update README for unified setup --- README.md | 163 +++++++++++++++++++++++++++--------------------------- 1 file changed, 80 insertions(+), 83 deletions(-) diff --git a/README.md b/README.md index 2c48bd3..fb6015d 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,6 @@ -# MMC.AI Setup Guide +# Memory Machine AI Setup Guide -## Installation prerequisites - -NVIDIA’s DeepOps project uses Ansible to deploy Kubernetes onto host machines. Ansible is an automation tool that allows system administrators to run commands on multiple machines, while interacting with only one host, called the “provisioning machine.” - -#### Setting up user accounts +## Setting up user accounts for Ansible A user with `sudo` permissions is needed on each host where Kubernetes will be installed. @@ -22,7 +18,7 @@ sudo usermod -aG sudo mmai-admin echo "mmai-admin ALL=(ALL:ALL) NOPASSWD: ALL" > /etc/sudoers.d/mmai-admin ``` -#### Enabling private-key SSH +## Enabling private-key SSH To allow Ansible to connect to remote hosts without querying for a password, private-key SSH connections must be enabled. From the provisioning machine, follow these steps: ```bash @@ -38,25 +34,22 @@ ssh-copy-id @ These instructions come from [NVIDIA’s guide on Ansible](https://github.com/NVIDIA/deepops/blob/master/docs/deepops/ansible.md#passwordless-configuration-using-ssh-keys), which contains more information. -## Ansible Installation with DeepOps +## [OPTIONAL] Installing Kubernetes via DeepOps The following set of commands will install Ansible on the provisioning machine. They must be run as a regular user. ```bash git clone https://github.com/NVIDIA/deepops.git -cd ./deepops +cd deepops git checkout 23.08 ./scripts/setup.sh ``` -## Editing Ansible Configurations +### Ansible configuration Once Ansible installation is complete, `deepops/config/inventory` must be configured by the system admin. - -#### `deepops/config/inventory` - This file defines which hosts will be used for Kubernetes installation. -Within there are four relevant headers: +Within, there are four relevant groups: - **`[all]`** A list of the hosts that will participate in the Kubernetes cluster. @@ -76,57 +69,24 @@ Within there are four relevant headers: - **`[kube-node]`** Should contain the cluster's "worker nodes" -- that is, nodes that do not appear in `[kube-master]`, but are expected to run workloads. -## Installing Kubernetes +### Kubernetes installation script Once Ansible configuration is complete, copy these commands into your terminal to install Kubernetes: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O deepops-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh -chmod +x deepops-setup.sh -./deepops-setup.sh +curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh | bash ``` -## Installing Kubeflow - -Download and run `kubeflow-setup.sh` on a node with kubectl and kustomize: -```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O git-clone.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/git-clone.sh -wget -O kubeflow-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/kubeflow-setup.sh -chmod +x kubeflow-setup.sh -./kubeflow-setup.sh -``` - -The following command prints the port for the Kubeflow Central Dashboard: -```bash -echo $(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}') -``` - -Using this port, the URL `http://:` will fetch the Kubeflow Central Dashboard, where `` is the IPv4 address of any node on the cluster. - - -## Installing NVIDIA GPU Operator - -Download and run `nvidia-gpu-operator-setup.sh` on the node used to manage Helm installations: -```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O gpu-operator-values.yaml https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/gpu-operator-values.yaml -wget -O nvidia-gpu-operator-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/nvidia-gpu-operator-setup.sh -chmod +x nvidia-gpu-operator-setup.sh -./nvidia-gpu-operator-setup.sh -``` - -## Installing MMC.AI +## Installing Memory Machine AI > **Important:** > The following prerequisites are necessary if you did not follow the instructions above: -> 1. Kubernetes set up. -> 2. [Default StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass) set up in cluster. -> 3. [Kubeflow](https://www.kubeflow.org/docs/started/installing-kubeflow/) installed in cluster. -> 4. NVIDIA GPU Operator installed via Helm in cluster with overrides from `gpu-operator-values.yaml`. -> 5. Node(s) in cluster with [Helm](https://helm.sh/docs/intro/quickstart/) installed. +> 1. User accounts for Ansible set up. +> 2. Private-key SSH enabled. +> 3. Kubernetes cluster set up. +> 4. [Default StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass) in Kubernetes cluster set up. + +### [INTERNAL] Helm login secrets -#### (Internal) Helm Login Secrets In order to download the pre-release packages, MemVerge team members must authenticate with the Github container registry. First, create a personal access token on this Github page: https://github.com/settings/tokens @@ -145,48 +105,88 @@ helm registry login ghcr.io/memverge/charts # Password: ``` -### Image Pull Secrets +### Image pull secrets Copy the `mmcai-ghcr-secret.yaml` file provided by MemVerge to the node with `kubectl` access (i.e., the "control plane node"). Then, deploy its image pull credentials to the cluster like so: ```bash kubectl apply -f mmcai-ghcr-secret.yaml ``` -### Cluster Components +### Ansible configuraiton + +In an inventory file (which can be named anything), configure the two groups: +- **`[all]`** + A list of the hosts that will participate in the Kubernetes cluster. + For example: + ``` + [all] + ansible_host= + ansible_host= + # The following will configure the local machine as a target: + # host-1 ansible_host=localhost + ``` + In order to have the Kubernetes node names match with the names of the servers in the cluster, it is best to let `` be the domain name of the remote host. You can determine a host's domain by running the `hostname` command (without the optional `-f` flag, which prints the fully qualified domain name) on each machine. +- **`[mmai_database]`** + Memory Machine AI MySQL database (single) node. The specified node will be used for a database. + For example: + ``` + [mmai_database] + ansible_host= + ``` + +This file will be used by the Memory Machine AI installation script. + +### Memory Machine AI installation script + +Download and run the interactive `mmcai-setup.sh` script on the control plane node. + +You will have a chance to confirm your changes after making your selections: -#### Billing Database -Download and run `mysql-pre-setup.sh` on the control plane node: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mysql-pre-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mysql-pre-setup.sh -chmod +x mysql-pre-setup.sh -./mysql-pre-setup.sh +curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh | bash ``` -#### MMC.AI Cluster and Management Planes -Download and run `mmcai-setup.sh` on the control plane node: -``` bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mmcai-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh -chmod +x mmcai-setup.sh -./mmcai-setup.sh -# Answer prompts as needed. +If MMC.AI Manager is installed, the MMC.AI dashboard should be accessible at `http://:32323`. + +If Kubeflow is installed, the following command should print the port for the Kubeflow Central Dashboard: + +```bash +echo $(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}') ``` -Once deployed, the MMC.AI dashboard should be accessible at `http://:32323`. +Using this port, the URL `http://:` will fetch the Kubeflow Central Dashboard, where `` is the IPv4 address of any node on the cluster. + +# Memory Machine AI Reset Guide -# MMC.AI Reset Guide +## Resetting Memory Machine AI -If the MMC.AI installation is in a bad state, you can perform a full reinstall with the following script. The `ghcr` secret file from above must be provided to the script via the `-f` option. +If the Memory Machine AI installation is in a bad state, you can perform a full reinstall with the following script. The `ghcr` secret file from above must be provided to the script via the `-f` option. ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mmcai-reset.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh -chmod +x mmcai-teardown.sh +curl -Lfo mmcai-reset.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-reset.sh +chmod +x mmcai-reset.sh ./mmcai-reset.sh -f mmcai-ghcr-secret.yaml ``` -# MMC.AI Teardown Guide +# Memory Machine AI Teardown Guide + +## Uninstalling Memory Machine AI + +### Ansible configuraiton + +In an inventory file (which can be named anything), configure the group: +- **`[mmai_database]`** + Memory Machine AI MySQL database (single or multiple) nodes. Databases on the specified nodes will be removed. + For example: + ``` + [mmai_database] + ansible_host= + ansible_host= + ``` + +This file will be used by the Memory Machine AI uninstallation script. + +### Memory Machine AI uninstallation script Download and run the interactive `mmcai-teardown.sh` script on the control plane node. @@ -195,8 +195,5 @@ If you have nothing else installed in the cluster and want to remove everything, You will have a chance to confirm your changes after making your selections: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mmcai-teardown.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh -chmod +x mmcai-teardown.sh -./mmcai-teardown.sh +curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh | bash ``` From 96adb8c01b92889bdb1d790f0a049332fce75539 Mon Sep 17 00:00:00 2001 From: Edwin Yu Date: Tue, 10 Sep 2024 18:00:31 +0000 Subject: [PATCH 2/4] Fix typos and add clarity --- README.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index fe7138c..86d0bd5 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ git checkout 23.08 Once Ansible installation is complete, `deepops/config/inventory` must be configured by the system admin. This file defines which hosts will be used for Kubernetes installation. -Within, there are four relevant groups: +Within, there are four relevant host groups: - **`[all]`** A list of the hosts that will participate in the Kubernetes cluster. @@ -112,10 +112,13 @@ Copy the `mmcai-ghcr-secret.yaml` file provided by MemVerge to the node with `ku kubectl apply -f mmcai-ghcr-secret.yaml ``` -### Ansible configuraiton +### Ansible configuration -In an inventory file (which can be named anything), configure the two groups: +In an inventory file (which can be named anything), configure two host groups: - **`[all]`** + > **Note:** + > The [all] group in this section should be identical to the one in `deepops/config/inventory` if you installed Kubernetes via DeepOps. + A list of the hosts that will participate in the Kubernetes cluster. For example: ``` @@ -160,9 +163,9 @@ Using this port, the URL `http://:` will fetch the Kubeflow Centr ## Uninstalling Memory Machine AI -### Ansible configuraiton +### Ansible configuration -In an inventory file (which can be named anything), configure the group: +In an inventory file (which can be named anything), configure the host group: - **`[mmai_database]`** Memory Machine AI MySQL database (single or multiple) nodes. Databases on the specified nodes will be removed. For example: From ae1a9b82033f527637f9f7289d580a1a4cdc5a6e Mon Sep 17 00:00:00 2001 From: Edwin Yu Date: Fri, 13 Sep 2024 01:19:38 +0000 Subject: [PATCH 3/4] Don't curl pipe bash interactive scripts --- README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 86d0bd5..c1b2f0b 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,9 @@ Within, there are four relevant host groups: Once Ansible configuration is complete, copy these commands into your terminal to install Kubernetes: ```bash -curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh | bash +curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh +chmod +x deepops-setup.sh +./deepops-setup.sh ``` ## Installing Memory Machine AI @@ -146,7 +148,9 @@ Download and run the interactive `mmcai-setup.sh` script on the control plane no You will have a chance to confirm your changes after making your selections: ```bash -curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh | bash +curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh +chmod +x mmcai-setup.sh +./mmcai-setup.sh ``` If MMC.AI Manager is installed, the MMC.AI dashboard should be accessible at `http://:32323`. @@ -186,5 +190,7 @@ If you have nothing else installed in the cluster and want to remove everything, You will have a chance to confirm your changes after making your selections: ```bash -curl -Lf https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh | bash +curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh +chmod +x mmcai-teardown.sh +./mmcai-teardown.sh ``` From 8f18f5f134f7c9eceb2d5208562e3f3e0c648b3c Mon Sep 17 00:00:00 2001 From: Edwin Yu Date: Fri, 20 Sep 2024 21:05:01 +0000 Subject: [PATCH 4/4] Don't cURL scripts --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index c1b2f0b..482c00c 100644 --- a/README.md +++ b/README.md @@ -73,8 +73,8 @@ Within, there are four relevant host groups: Once Ansible configuration is complete, copy these commands into your terminal to install Kubernetes: ```bash -curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh -chmod +x deepops-setup.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup ./deepops-setup.sh ``` @@ -148,8 +148,8 @@ Download and run the interactive `mmcai-setup.sh` script on the control plane no You will have a chance to confirm your changes after making your selections: ```bash -curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh -chmod +x mmcai-setup.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup ./mmcai-setup.sh ``` @@ -190,7 +190,7 @@ If you have nothing else installed in the cluster and want to remove everything, You will have a chance to confirm your changes after making your selections: ```bash -curl -LfO https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh -chmod +x mmcai-teardown.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup ./mmcai-teardown.sh ```