Skip to content

Commit

Permalink
README doc review for root, cmd, and examples
Browse files Browse the repository at this point in the history
  • Loading branch information
nick-stroud committed May 20, 2022
1 parent 4e9e98a commit 21209fb
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 47 deletions.
51 changes: 27 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ HPC Toolkit is an open-source software offered by Google Cloud which makes it
easy for customers to deploy HPC environments on Google Cloud.

HPC Toolkit allows customers to deploy turnkey HPC environments (compute,
networking, storage, etc) following Google Cloud best-practices, in a repeatable
networking, storage, etc.) following Google Cloud best-practices, in a repeatable
manner. The HPC Toolkit is designed to be highly customizable and extensible,
and intends to address the HPC deployment needs of a broad range of customers.

Expand Down Expand Up @@ -140,14 +140,14 @@ packer build .

## HPC Toolkit Components

The HPC Toolkit has been designed to simplify the process of deploying a
familiar HPC cluster on Google Cloud. The block diagram below describes the
individual components of the HPC toolkit.
The HPC Toolkit has been designed to simplify the process of deploying an HPC
cluster on Google Cloud. The block diagram below describes the individual
components of the HPC toolkit.

```mermaid
graph LR
subgraph HPC Environment Configuration
A(1. GCP-provided Blueprint Examples) --> B(2. HPC Blueprint)
A(1. Provided Blueprint Examples) --> B(2. HPC Blueprint)
end
B --> D
subgraph Creating an HPC Deployment
Expand All @@ -159,19 +159,19 @@ graph LR
end
```

1. **GCP-provided Blueprint Examples** – A set of vetted reference blueprints
can be found in the examples directory. These can be used to create a
predefined deployment for a cluster or as a starting point for creating a
custom deployment.
1. **Provided Blueprint Examples** – A set of vetted reference blueprints can be
found in the ./examples and ./community/examples directories. These can be
used to create a predefined deployment for a cluster or as a starting point
for creating a custom deployment.
2. **HPC Blueprint** – The primary interface to the HPC Toolkit is an HPC
Blueprint file. This is a YAML file that defines which modules to use and how
to customize them.
3. **gHPC Engine** – The gHPC engine converts the blueprint file into a
self-contained deployment directory.
4. **HPC Modules** – The building blocks of a deployment directory are the
3. **HPC Modules** – The building blocks of a deployment directory are the
modules. Modules can be found in the ./modules and community/modules
directories. They are composed of terraform, packer and/or script files that
meet the expectations of the gHPC engine.
4. **gHPC Engine** – The gHPC engine converts the blueprint file into a
self-contained deployment directory.
5. **Deployment Directory** – A self-contained directory that can be used to
deploy a cluster onto Google Cloud. This is the output of the gHPC engine.
6. **HPC environment on GCP** – After deployment, an HPC environment will be
Expand Down Expand Up @@ -239,8 +239,7 @@ to the Google Cloud Console.
Many of the above examples are easily executed within a Cloud Shell environment.
Be aware that Cloud Shell has [several limitations][cloud-shell-limitations],
in particular an inactivity timeout that will close running shells after 20
minutes. Please consider it only for small blueprints that are quickly
deployed.
minutes. Please consider it only for blueprints that are quickly deployed.

## Blueprint Warnings and Errors

Expand Down Expand Up @@ -344,7 +343,7 @@ To view the Cloud Billing reports for your Cloud Billing account:
[`Billing`](https://console.cloud.google.com/billing/overview).
2. At the prompt, choose the Cloud Billing account for which you'd like to view
reports. The Billing Overview page opens for the selected billing account.
3. In the Billing navigation menu, select Reports.
3. In the Billing navigation menu, select `Reports`.

In the right side, expand the Filters view and then filter by label, specifying the key `ghpc_deployment` (or `ghpc_blueprint`) and the desired value.

Expand Down Expand Up @@ -468,7 +467,7 @@ can be found in the [Slurm on Google Cloud User Guide][slurm-on-gcp-ug],
specifically the section titled "Create Service Accounts".

After creating the service account, it can be set via the
"compute_node_service_account" and "controller_service_account" settings on the
`compute_node_service_account` and `controller_service_account` settings on the
[slurm-on-gcp controller module][slurm-on-gcp-con] and the
"login_service_account" setting on the
[slurm-on-gcp login module][slurm-on-gcp-login].
Expand All @@ -493,7 +492,7 @@ message. Here are some common reasons for the deployment to fail:
* **Filestore resource limit:** When regularly deploying filestore instances
with a new vpc you may see an error during deployment such as:
`System limit for internal resources has been reached`. See
[this doc](https://cloud.google.com/filestore/docs/troubleshooting#api_cannot_be_disabled)
[this doc](https://cloud.google.com/filestore/docs/troubleshooting#system_limit_for_internal_resources_has_been_reached_error_when_creating_an_instance)
for the solution.
* **Required permission not found:**
* Example: `Required 'compute.projects.get' permission for 'projects/... forbidden`
Expand Down Expand Up @@ -596,15 +595,19 @@ List of dependencies:
* make
* git

## MacOS Details
### MacOS Additional Dependencies

When building the ghpc binary on a Mac there may be some special considerations.

When you call `make` for the first time you may be asked to install xcode
developer tools. Alternatively you can build `ghpc` directly using
`go build ghpc.go`.

* Install GNU `findutils` with Homebrew or Conda
* `brew install findutils` (and follow instructions for modifying `PATH`)
* `conda install findutils`
* If using `conda`, it's easier to use conda-forge Golang without CGO
* `conda install go go-nocgo go-nocgo_osx-64`
If you choose to use `make`, you should install `coreutils` and `findutils`
which are available from common package managers on macOS such as Homebrew,
Macports, and conda.

### Packer
### Notes on Packer

The Toolkit supports Packer templates in the contemporary [HCL2 file
format][pkrhcl2] and not in the legacy JSON file format. We require the use of
Expand Down
2 changes: 1 addition & 1 deletion cmd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## ghpc

`ghpc` is the tool used by Cloud HPC Toolkit to create deployments of HPC
clusters.
clusters, also referred to as the gHPC Engine.

### Usage - ghpc

Expand Down
37 changes: 15 additions & 22 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,8 @@ md_toc github examples/README.md | sed -e "s/\s-\s/ * /"

## Instructions

Ensure your project\_id is set and other deployment variables such as zone and
region are set correctly under `vars` before creating and deploying an example
blueprint.
Ensure `project_id`, `zone`, and `region` deployment variables are set correctly
under `vars` before using an example blueprint.

> **_NOTE:_** Deployment variables defined under `vars` are automatically passed
> to modules if the modules have an input that matches the variable name.
Expand Down Expand Up @@ -84,14 +83,14 @@ HPC Toolkit team directly.
The community blueprints are contributed by the community (including the HPC
Toolkit team, partners, etc.) and are labeled with the community badge
(![community-badge]). The community blueprints are located in the
[community folder](../community/examples/README.md).
[community folder](../community/examples/).

Blueprints that are still in development and less stable are also labeled with
the experimental badge (![experimental-badge]).

### [hpc-cluster-small.yaml] ![core-badge]

Creates a basic auto-scaling SLURM cluster with mostly default settings. The
Creates a basic auto-scaling Slurm cluster with mostly default settings. The
blueprint also creates a new VPC network, and a filestore instance mounted to
`/home`.

Expand Down Expand Up @@ -127,7 +126,7 @@ Quota required for this example:

### [hpc-cluster-high-io.yaml] ![core-badge]

Creates a slurm cluster with tiered file systems for higher performance. It
Creates a Slurm cluster with tiered file systems for higher performance. It
connects to the default VPC of the project and creates two partitions and a
login node.

Expand Down Expand Up @@ -303,7 +302,7 @@ file that was added during image build:

### [hpc-cluster-intel-select.yaml] ![community-badge]

This example provisions a Slurm cluster [automating the steps to comply to the
This example provisions a Slurm cluster automating the [steps to comply to the
Intel Select Solutions for Simulation & Modeling Criteria][intelselect]. It is
more extensively discussed in a dedicated [README for Intel
examples][intel-examples-readme].
Expand All @@ -330,7 +329,7 @@ examples][intel-examples-readme].

### [spack-gromacs.yaml] ![community-badge] ![experimental-badge]

Spack is an HPC software package manager. This example creates a small slurm
Spack is an HPC software package manager. This example creates a small Slurm
cluster with software installed using the
[spack-install module](../community/modules/scripts/spack-install/README.md) The
controller will install and configure spack, and install
Expand All @@ -340,7 +339,7 @@ location (/sw) via filestore. This build leverages the
applied in any cluster by using the output of spack-install or
startup-script modules.

The installation will occur as part of the slurm startup-script, a warning
The installation will occur as part of the Slurm startup-script, a warning
message will be displayed upon SSHing to the login node indicating
that configuration is still active. To track the status of the overall
startup script, run the following command on the login node:
Expand All @@ -356,8 +355,8 @@ your blueprint, by default /var/log/spack.log in the login node.
sudo tail -f /var/log/spack.log
```

Once the Slurm and Spack configuration is complete, spack will available on the
login node. To use spack in the controller or compute nodes, the following
Once the Slurm and Spack configuration is complete, spack will be available on
the login node. To use spack in the controller or compute nodes, the following
command must be run first:

```shell
Expand All @@ -371,9 +370,8 @@ spack load gromacs
```

> **_NOTE:_** Installing spack compilers and libraries in this example can take
> 1-2 hours to run on startup. To decrease this time in future deployments,
> consider including a spack build cache as described in the comments of the
> example.
> hours to run on startup. To decrease this time in future deployments, consider
> including a spack build cache as described in the comments of the example.

[spack-gromacs.yaml]: ../community/examples/spack-gromacs.yaml

Expand Down Expand Up @@ -604,20 +602,15 @@ variables are not supported.

### Literal Variables

Formally passthrough variables.

Literal variables are not interpreted by `ghpc` directly, but rather for the
Literal variables are not interpreted by `ghpc` directly, but rather embedded in the
underlying module. Literal variables should only be used by those familiar
with the underlying module technology (Terraform or Packer); no validation
will be done before deployment to ensure that they are referencing
something that exists.

Literal variables are occasionally needed when referring to the data structure
of the underlying module. For example, take the
[hpc-cluster-high-io.yaml](./hpc-cluster-high-io.yaml) example blueprint. The
DDN-EXAScaler module requires a subnetwork self link, which is not currently an
output of either network module, therefore it is necessary to refer to the
primary network self link through terraform itself:
of the underlying module. For example, to refer to the subnetwork self link from
a vpc module through terraform itself:

```yaml
subnetwork_self_link: ((module.network1.primary_subnetwork.self_link))
Expand Down

0 comments on commit 21209fb

Please sign in to comment.