diff --git a/README.md b/README.md index b211199..482c00c 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,6 @@ -# MMC.AI Setup Guide +# Memory Machine AI Setup Guide -## Installation prerequisites - -NVIDIA’s DeepOps project uses Ansible to deploy Kubernetes onto host machines. Ansible is an automation tool that allows system administrators to run commands on multiple machines, while interacting with only one host, called the “provisioning machine.” - -#### Setting up user accounts +## Setting up user accounts for Ansible A user with `sudo` permissions is needed on each host where Kubernetes will be installed. @@ -22,7 +18,7 @@ sudo usermod -aG sudo mmai-admin echo "mmai-admin ALL=(ALL:ALL) NOPASSWD: ALL" > /etc/sudoers.d/mmai-admin ``` -#### Enabling private-key SSH +## Enabling private-key SSH To allow Ansible to connect to remote hosts without querying for a password, private-key SSH connections must be enabled. From the provisioning machine, follow these steps: ```bash @@ -38,25 +34,22 @@ ssh-copy-id @ These instructions come from [NVIDIA’s guide on Ansible](https://github.com/NVIDIA/deepops/blob/master/docs/deepops/ansible.md#passwordless-configuration-using-ssh-keys), which contains more information. -## Ansible Installation with DeepOps +## [OPTIONAL] Installing Kubernetes via DeepOps The following set of commands will install Ansible on the provisioning machine. They must be run as a regular user. ```bash git clone https://github.com/NVIDIA/deepops.git -cd ./deepops +cd deepops git checkout 23.08 ./scripts/setup.sh ``` -## Editing Ansible Configurations +### Ansible configuration Once Ansible installation is complete, `deepops/config/inventory` must be configured by the system admin. - -#### `deepops/config/inventory` - This file defines which hosts will be used for Kubernetes installation. -Within there are four relevant headers: +Within, there are four relevant host groups: - **`[all]`** A list of the hosts that will participate in the Kubernetes cluster. @@ -76,57 +69,26 @@ Within there are four relevant headers: - **`[kube-node]`** Should contain the cluster's "worker nodes" -- that is, nodes that do not appear in `[kube-master]`, but are expected to run workloads. -## Installing Kubernetes +### Kubernetes installation script Once Ansible configuration is complete, copy these commands into your terminal to install Kubernetes: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O deepops-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/deepops-setup.sh -chmod +x deepops-setup.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup ./deepops-setup.sh ``` -## Installing Kubeflow - -Download and run `kubeflow-setup.sh` on a node with kubectl and kustomize: -```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O git-clone.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/git-clone.sh -wget -O kubeflow-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/kubeflow-setup.sh -chmod +x kubeflow-setup.sh -./kubeflow-setup.sh -``` - -The following command prints the port for the Kubeflow Central Dashboard: -```bash -echo $(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}') -``` - -Using this port, the URL `http://:` will fetch the Kubeflow Central Dashboard, where `` is the IPv4 address of any node on the cluster. - - -## Installing NVIDIA GPU Operator - -Download and run `nvidia-gpu-operator-setup.sh` on the node used to manage Helm installations: -```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O gpu-operator-values.yaml https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/gpu-operator-values.yaml -wget -O nvidia-gpu-operator-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/nvidia-gpu-operator-setup.sh -chmod +x nvidia-gpu-operator-setup.sh -./nvidia-gpu-operator-setup.sh -``` - -## Installing MMC.AI +## Installing Memory Machine AI > **Important:** > The following prerequisites are necessary if you did not follow the instructions above: -> 1. Kubernetes set up. -> 2. [Default StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass) set up in cluster. -> 3. [Kubeflow](https://www.kubeflow.org/docs/started/installing-kubeflow/) installed in cluster. -> 4. NVIDIA GPU Operator installed via Helm in cluster with overrides from `gpu-operator-values.yaml`. -> 5. Node(s) in cluster with [Helm](https://helm.sh/docs/intro/quickstart/) installed. +> 1. User accounts for Ansible set up. +> 2. Private-key SSH enabled. +> 3. Kubernetes cluster set up. +> 4. [Default StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass) in Kubernetes cluster set up. + +### [INTERNAL] Helm login secrets -#### (Internal) Helm Login Secrets In order to download the pre-release packages, MemVerge team members must authenticate with the Github container registry. First, create a personal access token on this Github page: https://github.com/settings/tokens @@ -145,37 +107,81 @@ helm registry login ghcr.io/memverge/charts # Password: ``` -### Image Pull Secrets +### Image pull secrets Copy the `mmcai-ghcr-secret.yaml` file provided by MemVerge to the node with `kubectl` access (i.e., the "control plane node"). Then, deploy its image pull credentials to the cluster like so: ```bash kubectl apply -f mmcai-ghcr-secret.yaml ``` -### Cluster Components +### Ansible configuration + +In an inventory file (which can be named anything), configure two host groups: +- **`[all]`** + > **Note:** + > The [all] group in this section should be identical to the one in `deepops/config/inventory` if you installed Kubernetes via DeepOps. + + A list of the hosts that will participate in the Kubernetes cluster. + For example: + ``` + [all] + ansible_host= + ansible_host= + # The following will configure the local machine as a target: + # host-1 ansible_host=localhost + ``` + In order to have the Kubernetes node names match with the names of the servers in the cluster, it is best to let `` be the domain name of the remote host. You can determine a host's domain by running the `hostname` command (without the optional `-f` flag, which prints the fully qualified domain name) on each machine. +- **`[mmai_database]`** + Memory Machine AI MySQL database (single) node. The specified node will be used for a database. + For example: + ``` + [mmai_database] + ansible_host= + ``` + +This file will be used by the Memory Machine AI installation script. + +### Memory Machine AI installation script + +Download and run the interactive `mmcai-setup.sh` script on the control plane node. + +You will have a chance to confirm your changes after making your selections: -#### Billing Database -Download and run `mysql-pre-setup.sh` on the control plane node: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mysql-pre-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mysql-pre-setup.sh -chmod +x mysql-pre-setup.sh -./mysql-pre-setup.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup +./mmcai-setup.sh ``` -#### MMC.AI Cluster and Management Planes -Download and run `mmcai-setup.sh` on the control plane node: -``` bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mmcai-setup.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-setup.sh -chmod +x mmcai-setup.sh -./mmcai-setup.sh -# Answer prompts as needed. +If MMC.AI Manager is installed, the MMC.AI dashboard should be accessible at `http://:32323`. + +If Kubeflow is installed, the following command should print the port for the Kubeflow Central Dashboard: + +```bash +echo $(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}') ``` -Once deployed, the MMC.AI dashboard should be accessible at `http://:32323`. +Using this port, the URL `http://:` will fetch the Kubeflow Central Dashboard, where `` is the IPv4 address of any node on the cluster. + +# Memory Machine AI Teardown Guide + +## Uninstalling Memory Machine AI + +### Ansible configuration + +In an inventory file (which can be named anything), configure the host group: +- **`[mmai_database]`** + Memory Machine AI MySQL database (single or multiple) nodes. Databases on the specified nodes will be removed. + For example: + ``` + [mmai_database] + ansible_host= + ansible_host= + ``` + +This file will be used by the Memory Machine AI uninstallation script. -# MMC.AI Teardown Guide +### Memory Machine AI uninstallation script Download and run the interactive `mmcai-teardown.sh` script on the control plane node. @@ -184,8 +190,7 @@ If you have nothing else installed in the cluster and want to remove everything, You will have a chance to confirm your changes after making your selections: ```bash -wget -O logging.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/logging.sh -wget -O mmcai-teardown.sh https://raw.githubusercontent.com/MemVerge/mmc.ai-setup/main/mmcai-teardown.sh -chmod +x mmcai-teardown.sh +git clone https://github.com/MemVerge/mmc.ai-setup +cd mmc.ai-setup ./mmcai-teardown.sh ```