Skip to content

Commit

Permalink
Merge pull request #15 from jameshcorbett/dws-directive-breakdown
Browse files Browse the repository at this point in the history
DWS: end-to-end error-free interactions
  • Loading branch information
mergify[bot] authored Jan 31, 2023
2 parents 5caa6f7 + dd34628 commit a3c86f8
Show file tree
Hide file tree
Showing 57 changed files with 2,970 additions and 2,381 deletions.
12 changes: 0 additions & 12 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,6 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0

- name: Create k8s Kind Cluster
uses: helm/[email protected]
with:
config: ./kind/kind-config.yaml

- name: Install CRDs
run: |
# Add the NNF CRD
kubectl apply -f ./k8s/NearNodeFlash.yaml
# Add all of the NNF instances
kubectl apply -f ./k8s/
- name: docker-run-checks
run: ./src/test/docker/docker-run-checks.sh -i ${{ matrix.image }} --

Expand Down
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ communicate with the developers.
Welcome to Flux-specific plugins and scripts for the
DOE [CORAL2](https://procurement.ornl.gov/rfp/CORAL2/) systems.

Flux CORAL2 consists (or will consist) of several plugins, including:
Flux CORAL2 consists of several plugins, including:
- A Fluxion resource match plugin to make performant selections of compute and [storage nodes](https://www.hpcwire.com/2021/02/18/livermores-el-capitan-supercomputer-hpe-rabbit-storage-nodes/)
- `job-shell` plugin to support Cray MPI bootstrap via libpals
- `job-manager` jobtap plugin to insert a job dependency on `#DW` string validation and translation
- Script to validate `#DW` strings via the DataWarp Services (DWS) K8s API and then insert translated rules into user's submitted jobspec
- Script to validate `#DW` strings via the Data Workflow Services (DWS) K8s API and then insert translated rules into user's submitted jobspec
- Script to generate a Fluxion JGF file from DWS inventory API
- `job-exec` plugin to transition jobs through the DWS job lifecycle and passthrough `DW_*` env vars to job environment
- flux-core `resource` module plugin to track inventory status changes via the DWS API
Expand All @@ -23,12 +23,16 @@ Flux CORAL2 requires an installed flux-core and flux-sched package. Instruction
for building/accessing these packages can be found in
[Flux's documentation](https://flux-framework.readthedocs.io/en/latest/quickstart.html#building-the-code).

### Running Flux CORAL2
### Building Data Workflow Services (DWS)

Flux CORAL2 requires a K8s server with the DWS CRDs and the NNF objects contained in `./k8s`.

Flux CORAL2 requires a K8s server with the NNF CRD and the NNF objects contained in `./k8s`.
When developing locally, the best way to achieve this is to use the NNF team's [nnf-deploy](github.com/NearNodeFlash/nnf-deploy) tool. Follow the instructions in the readme to build the tool and to [deploy it to a kind cluster](github.com/NearNodeFlash/nnf-deploy#kind-cluster). You will need to have [go](https://go.dev/) and [kind](https://kind.sigs.k8s.io/) installed.

When developing locally, the suggested way to achieve this is to start a [kind](https://kind.sigs.k8s.io/) cluster with `kind create cluster --config=./kind/kind-config.yaml`. You can then install the CRD with `kubectl apply -f ./k8s/NearNodeFlash.yaml` and the objects with `kubectl apply -f ./k8s/`.
The next step is to add the objects to your cluster with `kubectl apply -f ./k8s`.

### Running Flux CORAL2

Once you have a k8s server up and running and the proper credentials in your `~/.kube/config` file, you can then launch a container that runs the full testsuite with `./src/test/docker/docker-run-checks.sh --`. This script not only builds a test container on top of the flux-sched image and running the testsuite, but it also handles mounting and tweaking the k8s credentials on your host to work within the container. You can also use the container interactively by passing the `-I` flag to `docker-run-checks.sh`.
Once you have a k8s server up and running and the proper credentials in your `~/.kube/config` file, you can launch a container that runs the full testsuite with `./src/test/docker/docker-run-checks.sh --`. This script not only builds a test container on top of the flux-sched image and running the testsuite, but it also handles mounting and tweaking the k8s credentials on your host to work within the container. You can also use the container interactively by passing the `-I` flag to `docker-run-checks.sh`.

Note: Some of the tests interact with a single, shared K8s cluster. While the tests as a whole are designed to be idempotent, their interactions with K8s are not atomic. Thus it is advised that the testsuite be run serially (e.g., `make check`) as opposed to in parallel (e.g., `make -j check`) to avoid tests conflicting with one another's expected K8s state.
40 changes: 0 additions & 40 deletions k8s/NearNodeFlash.yaml

This file was deleted.

51 changes: 51 additions & 0 deletions k8s/flux-test-storage0.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
apiVersion: dws.cray.hpe.com/v1alpha1
spec:
state: Enabled
status:
access:
computes:
- name: Compute0
status: Ready
- name: Compute1
status: Ready
- name: Compute2
status: Ready
- name: Compute3
status: Ready
- name: Compute4
status: Ready
- name: Compute5
status: Ready
- name: Compute6
status: Ready
- name: Compute7
status: Ready
- name: Compute8
status: Ready
- name: Compute9
status: Ready
- name: Compute10
status: Ready
- name: Compute11
status: Ready
- name: Compute12
status: Ready
- name: Compute13
status: Ready
- name: Compute14
status: Ready
- name: Compute15
status: Ready
protocol: PCIe
servers:
- name: Rabbit
status: Ready
capacity: 39582418599936
status: Ready
type: NVMe
kind: Storage
metadata:
labels:
dws.cray.hpe.com/storage: Rabbit
name: flux-test-storage0
namespace: default
51 changes: 51 additions & 0 deletions k8s/flux-test-storage1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
apiVersion: dws.cray.hpe.com/v1alpha1
spec:
state: Enabled
status:
access:
computes:
- name: Compute16
status: Ready
- name: Compute17
status: Ready
- name: Compute18
status: Ready
- name: Compute19
status: Ready
- name: Compute20
status: Ready
- name: Compute21
status: Ready
- name: Compute22
status: Ready
- name: Compute23
status: Ready
- name: Compute24
status: Ready
- name: Compute25
status: Ready
- name: Compute26
status: Ready
- name: Compute27
status: Ready
- name: Compute28
status: Ready
- name: Compute29
status: Ready
- name: Compute30
status: Ready
- name: Compute31
status: Ready
protocol: PCIe
servers:
- name: Rabbit
status: Ready
capacity: 39582418599936
status: Ready
type: NVMe
kind: Storage
metadata:
labels:
dws.cray.hpe.com/storage: Rabbit
name: flux-test-storage1
namespace: default
28 changes: 0 additions & 28 deletions k8s/nnf-x01c1.yaml

This file was deleted.

28 changes: 0 additions & 28 deletions k8s/nnf-x01c2.yaml

This file was deleted.

28 changes: 0 additions & 28 deletions k8s/nnf-x01c3.yaml

This file was deleted.

28 changes: 0 additions & 28 deletions k8s/nnf-x01c4.yaml

This file was deleted.

28 changes: 0 additions & 28 deletions k8s/nnf-x01c5.yaml

This file was deleted.

28 changes: 0 additions & 28 deletions k8s/nnf-x01c6.yaml

This file was deleted.

Loading

0 comments on commit a3c86f8

Please sign in to comment.