first steps at getting started docs

acrlabs · Mar 1, 2024 · bbf48c8 · bbf48c8
1 parent 0d96e54
commit bbf48c8
Show file tree

Hide file tree

Showing 15 changed files with 547 additions and 81 deletions.
diff --git a/README.md b/README.md
@@ -30,69 +30,16 @@ This package provides the following components:
 
 [![Watch the video](https://img.youtube.com/vi/Q1XpH1H4It8/hqdefault.jpg)](https://www.youtube.com/watch?v=Q1XpH1H4It8)
 
-## Installation
+## Documentation
 
-### Prerequisites
+Full [documentation for SimKube](https://appliedcomputing.io/docs/simkube/index.html) is available on Applied
+Computing's website.  Here are some quick links to select topics:
 
-The following prereqs are required for all components:
-
-- Rust >= 1.71 (needed if you want to build outside of Docker)
-- Docker
-- kubectl >= 1.27
-- Kubernetes >= 1.27
-
-Additional prerequisites are necessary for your simulation cluster:
-
-- [KWOK](https://kwok.sigs.k8s.io) >= 0.4.0
-- [CertManager](https://cert-manager.io) for setting up mutating webhook certificates
-- [The Promtheus operator](https://github.com/prometheus-operator/prometheus-operator); we recommend configuring this
-  via the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project
-
-### Optional Prerequisites
-
-SimKube uses [🔥Config](https://github.com/acrlabs/fireconfig) to generate Kubernetes manifests from definitions located
-in `./k8s/`.  If you want to use this mechanism for generating Kubernetes manifests, you will need to install the
-following additional dependencies:
-
-- Python 3.10
-- Python Poetry (https://python-poetry.org/docs/)
-
-Additionally, if you want to run SimKube on a local development cluster, [kind](https://kind.sigs.k8s.io) >= 0.19 is the
-supported tooling for doing so.
-
-If you want to test autoscaling, SimKube currently supports either the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler)
-or [Karpenter](https://karpenter.sh).  You will need to install and configure these applications to use the
-corresponding KWOK provider.  For the Kubernetes Cluster Autoscaler, a KWOK [cloud provider](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/kwok)
-is available, and for Karpenter, a basic [KWOK provider](https://github.com/kubernetes-sigs/karpenter/tree/main/kwok) is
-used.  Configuring these applications in your environment is beyond the scope of this documentation.
-
-If you intend to save metrics or data from a simulation, you will need to configure Prometheus with a [remote write
-endpoint](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).  One option here is
-the [prom2parquet writer](https://github.com/acrlabs/prom2parquet).  See [the docs](docs/sk-ctrl.md) for more
-information on how to set this up.
-
-### Building
-
-To build all SimKube artifacts for the first time run:
-
-- `git submodule init && git submodule update`
-- `make build` from the root of this repository.
-
-For all subsequent builds of SimKube artifacts, run only `make build` from the root of this repository.
-
-### Docker images
-
-To build and push Docker images for all the artifacts, run `DOCKER_REGISTRY=path_to_your_registry:5000 make image`
-
-### Deploying
-
-You will need a KUBECONFIG file with cluster admin permissions; `make run` will use 🔥Config to generate the Kubernetes
-manifests and deploy all SimKube artifacts to the specified cluster.
-
-### Cleaning up
-
-All build artifacts are placed in the `.build/` directory.  You can remove this directory or run `make clean` to clean
-up.
+- [Installation](https://appliedcomputing.io/docs/simkube/intro/installation.html)
+- [Autoscaling](http://appliedcomputing.io/docs/simkube/adv/autoscaling.html)
+- [Metrics Collection](http://appliedcomputing.io/docs/simkube/adv/metrics..html)
+- [Component Reference](http://appliedcomputing.io/docs/simkube/sk-ctrl.html)
+- [Developing SimKube](http://appliedcomputing.io/docs/simkube/dev/contributing.html)
 
 ## Contributing
 
@@ -121,7 +68,3 @@ default](https://docs.github.com/en/site-policy/github-terms/github-terms-of-ser
 > Due to the uncertain nature of copyright and IP law, this repository does not accept contributions that have been all
 > or partially generated with GitHub Copilot or other LLM-based code generation tools.  Please disable any such tools
 > before authoring changes to this project.
-
-### Contributor's Guide
-
-Please see the [Contributor's Guide](./docs/contributing.md) for more information on setting up and building SimKube.
diff --git a/ctrl/controller.rs b/ctrl/controller.rs
@@ -199,7 +199,6 @@ pub(super) async fn cleanup(ctx: &SimulationContext, sim: &Simulation) {
         }
     }
     if let Err(e) = prom_api.delete(&ctx.prometheus_name, &Default::default()).await {
-        println!("{e:?}");
         if matches!(e, Api(ErrorResponse { code: 404, .. })) {
             warn!("prometheus object not found; maybe already cleaned up?");
         } else {

diff --git a/docs/adv/autoscaling.md b/docs/adv/autoscaling.md
@@ -0,0 +1,92 @@
+<!--
+project: SimKube
+template: docs.html
+-->
+
+# Autoscaling with SimKube
+
+When running your simulations, you probably don't want to have to manually create a bunch of KWOK nodes every time.
+Fortunately, both [Cluster Autoscaler](https://github.com/kubernetes/autoscaler) and [Karpenter](https://karpenter.sh)
+(the two most popular cluster autoscalers for Kubernetes) support KWOK, which means you can have them autoscaler your
+simulated cluster so you don't have to manually create virtual nodes.
+
+## Cluster Autoscaler instructions
+
+You will need to run Cluster Autoscaler with the `--cloud-provider kwok` argument.  The KWOK Cluster Autoscaler provider
+expects two ConfigMaps to be present; the first tells the KWOK cloudprovider what to use for its Cluster Autoscaler
+NodeGroups:
+
+```yaml
+# provider-config.yml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: kwok-provider-config
+  namespace: kube-system
+data:
+  config: |
+    ---
+    apiVersion: v1alpha1
+    readNodesFrom: configmap
+    nodegroups:
+      fromNodeLabelKey: "kwok-nodegroup"
+    configmap:
+      name: kwok-provider-templates
+```
+
+The second enumerates the node types that the KWOK cloudprovider supports:
+
+```yaml
+# provider-templates.yml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: kwok-provider-templates
+  namespace: kube-system
+data:
+  templates: |
+    ---
+    apiVersion: v1
+    kind: List
+    items:
+      - apiVersion: v1
+        kind: Node
+        metadata:
+          annotations:
+            kwok.x-k8s.io/node: fake
+            kowk-nodegroup: node-group-1
+          labels:
+            node.kubernetes.io/instance-type: c5d.9xlarge
+            topology.kubernetes.io/zone: us-west-1a
+            type: virtual
+        status:
+          allocatable:
+            cpu: 31
+            ephemeral-storage: 900Gi
+            memory: 71Gi
+            pods: 110
+          capacity:
+            cpu: 36
+            ephemeral-storage: 900Gi
+            memory: 72Gi
+            pods: 110
+```
+
+The KWOK cloudprovider will automatically apply a `kwok-provider: true` taint to the nodes it generates with a
+`NoSchedule` effect on them.  SimKube will likewise apply the corresponding toleration to the virtual pods it creates.
+
+For more information on running and configuring KWOK for Cluster Autoscaler, see the
+[README](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/kwok).
+
+## Karpenter instructions
+
+The core [karpenter repo](https://github.com/kubernetes-sigs/karpenter) includes a KWOK provider for karpenter.  There
+are some initial instructions in there for installing the karpenter+KWOK binary into your cluster.  Once it's installed,
+it will automatically use KWOK to scale up nodes in the cluster just like Cluster Autoscaler.  As with Cluster
+Autoscaler, KWOK applies the `kwok-provider=true:NoSchedule` taint to the nodes it creates.
+
+> [!NOTE]
+> Unlike Cluster Autoscaler, karpenter does not take in a list of Kubernetes Node specs to determine what instances it
+> launches.  Instead, it uses a hard-coded list of "generic" instance types which roughly map to standard instance
+> offerings by the major cloud providers.  There is an [open PR](https://github.com/kubernetes-sigs/karpenter/pull/1048)
+> to enable configuring node types via an injected file.
diff --git a/docs/adv/metrics.md b/docs/adv/metrics.md
@@ -0,0 +1,8 @@
+<!--
+project: SimKube
+template: docs.html
+-->
+
+# Metrics Collection and Data Analysis
+
+Coming Soon
diff --git a/docs/sk-ctrl.md → docs/components/sk-ctrl.md b/docs/sk-ctrl.md → docs/components/sk-ctrl.md
diff --git a/docs/sk-driver.md → docs/components/sk-driver.md b/docs/sk-driver.md → docs/components/sk-driver.md
@@ -14,7 +14,7 @@ selectors, and tolerations to ensure that the simulated pods end up on virtual n
 
 ```
 Usage: sk-driver [OPTIONS] --sim-name <SIM_NAME> --sim-root <SIM_ROOT> --virtual-ns-prefix <VIRTUAL_NS_PREFIX> \
-       --cert-path <CERT_PATH> --key-path <KEY_PATH> --trace-path <TRACE_PATH>
+    --cert-path <CERT_PATH> --key-path <KEY_PATH> --trace-mount-path <TRACE_MOUNT_PATH>
 
 Options:
       --sim-name <SIM_NAME>
@@ -23,7 +23,7 @@ Options:
       --admission-webhook-port <ADMISSION_WEBHOOK_PORT>  [default: 8888]
       --cert-path <CERT_PATH>
       --key-path <KEY_PATH>
-      --trace-path <TRACE_PATH>
+      --trace-mount-path <TRACE_MOUNT_PATH>
   -v, --verbosity <VERBOSITY>                            [default: info]
   -h, --help                                             Print help
 ```

diff --git a/docs/sk-tracer.md → docs/components/sk-tracer.md b/docs/sk-tracer.md → docs/components/sk-tracer.md
diff --git a/docs/skctl.md → docs/components/skctl.md b/docs/skctl.md → docs/components/skctl.md
@@ -110,7 +110,11 @@ that isn't accepted or is parsed incorrectly, please [file an issue](https://git
 ```
 run a simulation
 
-Usage: skctl run [OPTIONS] --name <NAME>
+Usage: skctl run [OPTIONS] --name <NAME> [DURATION]
+
+Arguments:
+  [DURATION]
+          duration of the simulation
 
 Options:
       --name <NAME>
@@ -121,6 +125,16 @@ Options:
 
           [default: simkube]
 
+      --metrics-namespace <METRICS_NAMESPACE>
+          namespace to launch monitoring utilities in
+
+          [default: monitoring]
+
+      --metrics-service-account <METRICS_SERVICE_ACCOUNT>
+          service account with monitoring permissions
+
+          [default: prometheus-k8s]
+
       --trace-file <TRACE_FILE>
           location of the trace file for sk-driver to read
 
@@ -136,11 +150,9 @@ Options:
 ## skctl snapshot
 
 ```
-Usage: skctl snapshot [OPTIONS] --config-file <CONFIG_FILE> <TRACE_DURATION>
+take a point-in-time snapshot of a cluster (does not require sk-tracer to be running)
 
-Arguments:
-  <TRACE_DURATION>
-          duration of the generated trace file
+Usage: skctl snapshot [OPTIONS] --config-file <CONFIG_FILE>
 
 Options:
   -c, --config-file <CONFIG_FILE>

diff --git a/docs/api_changes.md → docs/dev/api_changes.md b/docs/api_changes.md → docs/dev/api_changes.md
diff --git a/docs/contributing.md → docs/dev/contributing.md b/docs/contributing.md → docs/dev/contributing.md
diff --git a/docs/faq.md b/docs/faq.md
@@ -0,0 +1,8 @@
+<!--
+project: SimKube
+template: docs.html
+-->
+
+# Frequently Asked Questions
+
+Coming Soon
diff --git a/docs/intro/concepts.md b/docs/intro/concepts.md
@@ -0,0 +1,45 @@
+<!--
+project: SimKube
+template: docs.html
+-->
+
+# SimKube Concepts
+
+SimKube is designed to allow users to simulate the behaviour of Kubernetes control plane components in a safe, isolated
+local environment.  It is a "record-and-replay" simulator, which means that users can record the behaviour of a production
+cluster and then save that data for future analysis.  Below we describe some of the key concepts of SimKube
+
+## What components are simulated?
+
+Typically when we talk about the Kubernetes control plane, we are talking about the API server, scheduler, and
+controller manager.  SimKube expands this definition to include anything that can impact the behaviour of a cluster,
+including projects like Cluster Autoscaler, descheduler, and others.
+
+SimKube accomplishes this by running in a cluster with a real control plane; however, all pod behaviours are mocked out
+using [Kubernetes WithOut Kubelet (KWOK)](https://kwok.sigs.k8s.io).  This means that anything that happens _inside_ a
+pod is effectively out of scope of the simulation.  KWOK does have utilities to mock out some aspects of pod lifecycle,
+but these are not (currently) supported by SimKube.  Crucially, this means that simulations that rely on the Horizontal
+Pod Autoscaler (for example) will not currently work.
+
+Note also that, unlike some simulation solutions, we are not mocking out any aspects of the control plane.  This means
+that simulations of cluster behaviour take place in real-time, and we do not have any hooks into or control over what
+messages are seen by various control plane components.  Thus, running the exact same simulation repeatedly may yield
+different results on each run, depending on timing fluctuations and other challenges of distributed systems.
+
+## How does it work?
+
+SimKube has a number of components that it uses to record data and run simulations:
+
+- _Tracer_: The `sk-tracer` program is a lightweight pod that runs in the cluster you wish to record the behaviour of.
+  It saves cluster events into a in-memory event stream, that is, a timeline of "important" changes in the cluster.  You
+  can configure `sk-tracer` to tell it what events you consider important.  Subsets of this event stream can be saved
+  into a _trace file_, which can be replayed later in a simulated environment.
+- _Controller_: The simulation controller `sk-ctrl` runs in a separate Kubernetes cluster and is responsible for setting
+  up simulations in that cluster.  The separate cluster can be a local cluster running on your laptop using
+  [kind](https://kind.sigs.k8s.io), or it can be a test cluster running in the cloud.  The only requirement is that the
+  simulation must have all the components present that you wish to simulate.  The controller watches for `Simulation`
+  custom resources to configure and start a new simulation run.  It sets up metrics collection and other required tools,
+  and then creates a simulation driver Job, which actually reads the specified trace file and replays the events within
+  against the simulated cluster.
+- _CLI_: SimKube comes with an CLI utility called `skctl`, which can be used to export trace data from the cluster under
+  study, as well as running new simulations in your simulated environment.