Skip to content

Commit

Permalink
first steps at getting started docs
Browse files Browse the repository at this point in the history
  • Loading branch information
drmorr0 committed Mar 1, 2024
1 parent 0d96e54 commit bbf48c8
Show file tree
Hide file tree
Showing 15 changed files with 547 additions and 81 deletions.
73 changes: 8 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,69 +30,16 @@ This package provides the following components:

[![Watch the video](https://img.youtube.com/vi/Q1XpH1H4It8/hqdefault.jpg)](https://www.youtube.com/watch?v=Q1XpH1H4It8)

## Installation
## Documentation

### Prerequisites
Full [documentation for SimKube](https://appliedcomputing.io/docs/simkube/index.html) is available on Applied
Computing's website. Here are some quick links to select topics:

The following prereqs are required for all components:

- Rust >= 1.71 (needed if you want to build outside of Docker)
- Docker
- kubectl >= 1.27
- Kubernetes >= 1.27

Additional prerequisites are necessary for your simulation cluster:

- [KWOK](https://kwok.sigs.k8s.io) >= 0.4.0
- [CertManager](https://cert-manager.io) for setting up mutating webhook certificates
- [The Promtheus operator](https://github.com/prometheus-operator/prometheus-operator); we recommend configuring this
via the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project

### Optional Prerequisites

SimKube uses [🔥Config](https://github.com/acrlabs/fireconfig) to generate Kubernetes manifests from definitions located
in `./k8s/`. If you want to use this mechanism for generating Kubernetes manifests, you will need to install the
following additional dependencies:

- Python 3.10
- Python Poetry (https://python-poetry.org/docs/)

Additionally, if you want to run SimKube on a local development cluster, [kind](https://kind.sigs.k8s.io) >= 0.19 is the
supported tooling for doing so.

If you want to test autoscaling, SimKube currently supports either the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler)
or [Karpenter](https://karpenter.sh). You will need to install and configure these applications to use the
corresponding KWOK provider. For the Kubernetes Cluster Autoscaler, a KWOK [cloud provider](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/kwok)
is available, and for Karpenter, a basic [KWOK provider](https://github.com/kubernetes-sigs/karpenter/tree/main/kwok) is
used. Configuring these applications in your environment is beyond the scope of this documentation.

If you intend to save metrics or data from a simulation, you will need to configure Prometheus with a [remote write
endpoint](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). One option here is
the [prom2parquet writer](https://github.com/acrlabs/prom2parquet). See [the docs](docs/sk-ctrl.md) for more
information on how to set this up.

### Building

To build all SimKube artifacts for the first time run:

- `git submodule init && git submodule update`
- `make build` from the root of this repository.

For all subsequent builds of SimKube artifacts, run only `make build` from the root of this repository.

### Docker images

To build and push Docker images for all the artifacts, run `DOCKER_REGISTRY=path_to_your_registry:5000 make image`

### Deploying

You will need a KUBECONFIG file with cluster admin permissions; `make run` will use 🔥Config to generate the Kubernetes
manifests and deploy all SimKube artifacts to the specified cluster.

### Cleaning up

All build artifacts are placed in the `.build/` directory. You can remove this directory or run `make clean` to clean
up.
- [Installation](https://appliedcomputing.io/docs/simkube/intro/installation.html)
- [Autoscaling](http://appliedcomputing.io/docs/simkube/adv/autoscaling.html)
- [Metrics Collection](http://appliedcomputing.io/docs/simkube/adv/metrics..html)
- [Component Reference](http://appliedcomputing.io/docs/simkube/sk-ctrl.html)
- [Developing SimKube](http://appliedcomputing.io/docs/simkube/dev/contributing.html)

## Contributing

Expand Down Expand Up @@ -121,7 +68,3 @@ default](https://docs.github.com/en/site-policy/github-terms/github-terms-of-ser
> Due to the uncertain nature of copyright and IP law, this repository does not accept contributions that have been all
> or partially generated with GitHub Copilot or other LLM-based code generation tools. Please disable any such tools
> before authoring changes to this project.
### Contributor's Guide

Please see the [Contributor's Guide](./docs/contributing.md) for more information on setting up and building SimKube.
1 change: 0 additions & 1 deletion ctrl/controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,6 @@ pub(super) async fn cleanup(ctx: &SimulationContext, sim: &Simulation) {
}
}
if let Err(e) = prom_api.delete(&ctx.prometheus_name, &Default::default()).await {
println!("{e:?}");
if matches!(e, Api(ErrorResponse { code: 404, .. })) {
warn!("prometheus object not found; maybe already cleaned up?");
} else {
Expand Down
92 changes: 92 additions & 0 deletions docs/adv/autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<!--
project: SimKube
template: docs.html
-->

# Autoscaling with SimKube

When running your simulations, you probably don't want to have to manually create a bunch of KWOK nodes every time.
Fortunately, both [Cluster Autoscaler](https://github.com/kubernetes/autoscaler) and [Karpenter](https://karpenter.sh)
(the two most popular cluster autoscalers for Kubernetes) support KWOK, which means you can have them autoscaler your
simulated cluster so you don't have to manually create virtual nodes.

## Cluster Autoscaler instructions

You will need to run Cluster Autoscaler with the `--cloud-provider kwok` argument. The KWOK Cluster Autoscaler provider
expects two ConfigMaps to be present; the first tells the KWOK cloudprovider what to use for its Cluster Autoscaler
NodeGroups:

```yaml
# provider-config.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: kwok-provider-config
namespace: kube-system
data:
config: |
---
apiVersion: v1alpha1
readNodesFrom: configmap
nodegroups:
fromNodeLabelKey: "kwok-nodegroup"
configmap:
name: kwok-provider-templates
```
The second enumerates the node types that the KWOK cloudprovider supports:
```yaml
# provider-templates.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: kwok-provider-templates
namespace: kube-system
data:
templates: |
---
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
kwok.x-k8s.io/node: fake
kowk-nodegroup: node-group-1
labels:
node.kubernetes.io/instance-type: c5d.9xlarge
topology.kubernetes.io/zone: us-west-1a
type: virtual
status:
allocatable:
cpu: 31
ephemeral-storage: 900Gi
memory: 71Gi
pods: 110
capacity:
cpu: 36
ephemeral-storage: 900Gi
memory: 72Gi
pods: 110
```
The KWOK cloudprovider will automatically apply a `kwok-provider: true` taint to the nodes it generates with a
`NoSchedule` effect on them. SimKube will likewise apply the corresponding toleration to the virtual pods it creates.

For more information on running and configuring KWOK for Cluster Autoscaler, see the
[README](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/kwok).

## Karpenter instructions

The core [karpenter repo](https://github.com/kubernetes-sigs/karpenter) includes a KWOK provider for karpenter. There
are some initial instructions in there for installing the karpenter+KWOK binary into your cluster. Once it's installed,
it will automatically use KWOK to scale up nodes in the cluster just like Cluster Autoscaler. As with Cluster
Autoscaler, KWOK applies the `kwok-provider=true:NoSchedule` taint to the nodes it creates.

> [!NOTE]
> Unlike Cluster Autoscaler, karpenter does not take in a list of Kubernetes Node specs to determine what instances it
> launches. Instead, it uses a hard-coded list of "generic" instance types which roughly map to standard instance
> offerings by the major cloud providers. There is an [open PR](https://github.com/kubernetes-sigs/karpenter/pull/1048)
> to enable configuring node types via an injected file.
8 changes: 8 additions & 0 deletions docs/adv/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--
project: SimKube
template: docs.html
-->

# Metrics Collection and Data Analysis

Coming Soon
File renamed without changes.
4 changes: 2 additions & 2 deletions docs/sk-driver.md → docs/components/sk-driver.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ selectors, and tolerations to ensure that the simulated pods end up on virtual n

```
Usage: sk-driver [OPTIONS] --sim-name <SIM_NAME> --sim-root <SIM_ROOT> --virtual-ns-prefix <VIRTUAL_NS_PREFIX> \
--cert-path <CERT_PATH> --key-path <KEY_PATH> --trace-path <TRACE_PATH>
--cert-path <CERT_PATH> --key-path <KEY_PATH> --trace-mount-path <TRACE_MOUNT_PATH>
Options:
--sim-name <SIM_NAME>
Expand All @@ -23,7 +23,7 @@ Options:
--admission-webhook-port <ADMISSION_WEBHOOK_PORT> [default: 8888]
--cert-path <CERT_PATH>
--key-path <KEY_PATH>
--trace-path <TRACE_PATH>
--trace-mount-path <TRACE_MOUNT_PATH>
-v, --verbosity <VERBOSITY> [default: info]
-h, --help Print help
```
Expand Down
File renamed without changes.
22 changes: 17 additions & 5 deletions docs/skctl.md → docs/components/skctl.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,11 @@ that isn't accepted or is parsed incorrectly, please [file an issue](https://git
```
run a simulation
Usage: skctl run [OPTIONS] --name <NAME>
Usage: skctl run [OPTIONS] --name <NAME> [DURATION]
Arguments:
[DURATION]
duration of the simulation
Options:
--name <NAME>
Expand All @@ -121,6 +125,16 @@ Options:
[default: simkube]
--metrics-namespace <METRICS_NAMESPACE>
namespace to launch monitoring utilities in
[default: monitoring]
--metrics-service-account <METRICS_SERVICE_ACCOUNT>
service account with monitoring permissions
[default: prometheus-k8s]
--trace-file <TRACE_FILE>
location of the trace file for sk-driver to read
Expand All @@ -136,11 +150,9 @@ Options:
## skctl snapshot

```
Usage: skctl snapshot [OPTIONS] --config-file <CONFIG_FILE> <TRACE_DURATION>
take a point-in-time snapshot of a cluster (does not require sk-tracer to be running)
Arguments:
<TRACE_DURATION>
duration of the generated trace file
Usage: skctl snapshot [OPTIONS] --config-file <CONFIG_FILE>
Options:
-c, --config-file <CONFIG_FILE>
Expand Down
File renamed without changes.
File renamed without changes.
8 changes: 8 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--
project: SimKube
template: docs.html
-->

# Frequently Asked Questions

Coming Soon
45 changes: 45 additions & 0 deletions docs/intro/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!--
project: SimKube
template: docs.html
-->

# SimKube Concepts

SimKube is designed to allow users to simulate the behaviour of Kubernetes control plane components in a safe, isolated
local environment. It is a "record-and-replay" simulator, which means that users can record the behaviour of a production
cluster and then save that data for future analysis. Below we describe some of the key concepts of SimKube

## What components are simulated?

Typically when we talk about the Kubernetes control plane, we are talking about the API server, scheduler, and
controller manager. SimKube expands this definition to include anything that can impact the behaviour of a cluster,
including projects like Cluster Autoscaler, descheduler, and others.

SimKube accomplishes this by running in a cluster with a real control plane; however, all pod behaviours are mocked out
using [Kubernetes WithOut Kubelet (KWOK)](https://kwok.sigs.k8s.io). This means that anything that happens _inside_ a
pod is effectively out of scope of the simulation. KWOK does have utilities to mock out some aspects of pod lifecycle,
but these are not (currently) supported by SimKube. Crucially, this means that simulations that rely on the Horizontal
Pod Autoscaler (for example) will not currently work.

Note also that, unlike some simulation solutions, we are not mocking out any aspects of the control plane. This means
that simulations of cluster behaviour take place in real-time, and we do not have any hooks into or control over what
messages are seen by various control plane components. Thus, running the exact same simulation repeatedly may yield
different results on each run, depending on timing fluctuations and other challenges of distributed systems.

## How does it work?

SimKube has a number of components that it uses to record data and run simulations:

- _Tracer_: The `sk-tracer` program is a lightweight pod that runs in the cluster you wish to record the behaviour of.
It saves cluster events into a in-memory event stream, that is, a timeline of "important" changes in the cluster. You
can configure `sk-tracer` to tell it what events you consider important. Subsets of this event stream can be saved
into a _trace file_, which can be replayed later in a simulated environment.
- _Controller_: The simulation controller `sk-ctrl` runs in a separate Kubernetes cluster and is responsible for setting
up simulations in that cluster. The separate cluster can be a local cluster running on your laptop using
[kind](https://kind.sigs.k8s.io), or it can be a test cluster running in the cloud. The only requirement is that the
simulation must have all the components present that you wish to simulate. The controller watches for `Simulation`
custom resources to configure and start a new simulation run. It sets up metrics collection and other required tools,
and then creates a simulation driver Job, which actually reads the specified trace file and replays the events within
against the simulated cluster.
- _CLI_: SimKube comes with an CLI utility called `skctl`, which can be used to export trace data from the cluster under
study, as well as running new simulations in your simulated environment.
Loading

0 comments on commit bbf48c8

Please sign in to comment.