Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated spec with model pipelines #95

Merged
merged 2 commits into from
Sep 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/Impact Engine Framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ A Framework to **M**odel, **M**easure, si**M**ulate and **M**onitor the environm
- ✅ [Impl (Impact YAML)](Impl%20(Impact%20YAML).md)
- 👷🏽‍♂️[Impact Model](Impact%20Model.md)
- ✅ [Impact Model Plugin](Impact%20Model%20Plugin.md)
- ✅ [Computation Pipeline](Computation%20Pipeline.md)
- ✅ [Model Pipeline](Model%20Pipeline.md)
- 👷🏽‍♂️ [Computation Pipeline](Computation%20Pipeline.md)
- 👷🏽‍♂️ Computation Pipeline Plugin 🔥
- 👷🏽‍♂️ [Impact Metric](Impact%20Metric) 🔥
- ✍🏽 [Rimpl](Rimpl.md)
Expand Down
364 changes: 364 additions & 0 deletions docs/Model Pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
---
author: Asim Hussain (@jawache)
abstract: How models are chained together into a pipeline in order to compute the impacts of
---
# Model Pipeline

## Introduction

As [Peter H. Salus](https://en.wikipedia.org/wiki/Peter_H._Salus "Peter H. Salus") said in his book [A Quarter-Century of Unix](https://en.wikipedia.org/wiki/Unix_philosophy#cite_note-taoup-ch1s6-1) the Unix philosophy is:
- Write programs that do one thing and do it well.
- Write programs to work together.
- Write programs to handle text streams, because that is a universal interface.

Our approach to models in the [Impact Engine Framework](Impact%20Engine%20Framework.md) is the same.
- Each model will do one thing and do it well.
- Models work with each other.
- The [Impl (Impact YAML)](Impl%20(Impact%20YAML).md) text format is the communication medium between models.

Calculating a component's impacts often requires using multiple models in sequence. Each model takes as input the outputs of the previous model in the chain, all working together to calculate impacts from observations.

```mermaid
flowchart LR
Observations --> Model1 --> Model2 --> Model3 --> Impacts
```

At the start of the chain, we input source observations. Each model in the chain takes as input the observations and modifies them somehow before passing them along to the next model in the chain.

The nature of the modification is flexible and defined by the model. Some model plugins will calculate an impact metric (for example, energy from utilization) and add that to the observation before passing that to the next model. Some models will enrich the observation with meta-data required for other models, for example, adding grid carbon intensity values.

## Initialization

All the model plugins used in any component in the graph are configured at the top of the [Impl (Impact YAML)](Impl%20(Impact%20YAML).md) in the initialize section. There are multiple places to add configuration for model plugins; any configuration added here is global and applied to every use of the model throughout the graph.

```yaml
.
.
config:
models:
- name: <model-name-1>
kind: builtin | plugin | shell
path: <path-to-ts-module-to-load>
config:
<key>: <value>
- name: <model-name-2>
kind: builtin | plugin | shell
path: <path-to-ts-module-to-load>
config:
<key>: <value>
.
.
graph: ~
```

## Usage

In component nodes, we then configure the models we want to use in a pipeline like so:

```yaml
backend:
pipeline:
- model-1
- model-2
- model-3
config:
jawache marked this conversation as resolved.
Show resolved Hide resolved
model-1:
<key>: <value>
model-2:
<key>: <value>
model-3:
<key>: <value>
observations:
- timestamp: 2023-07-06T00:00
duration: 5
cpu: 33
- timestamp: 2023-07-06T00:05
duration: 5
cpu: 23
- timestamp: 2023-07-06T00:10
duration: 5
cpu: 11
```


- `pipeline` defines the models we apply and the order in which we use them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the time normalization, aggregation etc count as models in this spec, or are they defined separately? In the old spec they were defined in the pipeline field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some things are now just either baked into the framework (e.g. aggregation) and not plugin/configurable. Aggregation now only literally needs to aggregate which will be the same for every use case.

time-normalization can be done in the models field, but it does mean that a model needs to be initialised differently (time-normalization model needs to know information about the whole graph so it can figure out the time buckets, but once it knows that it can just be used as any other model)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time-normalization can I realize be implemented as model like everything else (needs slight tweaking to the model initialization rules but it can work). Aggregation also now is just a normal standard aggregation and can just be baked into the framework.

- `config` in this part of the graph is config for each model in the pipeline (if any is required). Since we have multiple models, we need to define each config independently.
- `observations` are the source observations, the values we are pumping into the start of this pipeline.


## Example

Let's look at a simple pseudo example, to begin with a pipeline like so:

```yaml
pipeline:
- instance-metadata
- tdp
- teads-curve
```

Using the above, we can combine multiple smaller models together to calculate the energy consumed by this observation:

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
```

### `instance-metadata`

This model plugin takes as input an *instance type* and outputs the name of the physical processor of the underlying architecture as well as other valuable metadata, like so:

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
```

to

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having hyphens in middle causes the parsers to behave differently and especially in javascript, the object access syntax we use will be void and we need to use the array syntax.

physical-processor: Intel Xeon Platinum 8175 # <-- output
used-cores: 1 # <-- output
total-cores: 26 # <-- output
```


### `tdp`

Takes as input details about a physical processor, does a lookup against a [database](https://www.intel.com/content/www/us/en/products/sku/120506/intel-xeon-platinum-8170-processor-35-75m-cache-2-10-ghz/specifications.html) to obtain the TDP value (a measure of max power consumption), like so:

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
physical-processor: Intel Xeon Platinum 8175 # <-- input
used-cores: 1
total-cores: 26
```

to

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu: 33
physical-processor: Intel Xeon Platinum 8175
used-cores: 1
total-cores: 26
tdp: 165 # <-- output
```


### `teads-curve`

TEADs is a now very well-known set of [generalized power curve coefficients](https://medium.com/teads-engineering/building-an-aws-ec2-carbon-emissions-dataset-3f0fd76c98ac). Given a few inputs in can estimate the energy consumed for a given utilization. The coefficients are here for reference:

| util | coeff |
| ---- | ----- |
| Idle | 0.12 |
| 10 | 0.32 |
| 50 | 0.75 |
| 100 | 1.02 |

So if the TDP is 100W and the utilization is 50%, then according to the TEADs curve, the power would be 0.75 * 100 = 75W. The curve is generalized, so it is of limited usefulness in deciding which arch is better. Still, it is generalized enough to be used in many contexts where the data is limited.

Using a `teads-curve` model, we'd be able to estimate energy like so:

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5 # <-- input
cpu-util: 33 # <-- input
physical-processor: Intel Xeon Platinum 8175
used-cores: 1 # <-- input
total-cores: 26 # <-- input
tdp: 165 # <-- input
```

to

```yaml
observations:
- timestamp: 2023-07-06T00:00
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
physical-processor: Intel Xeon Platinum 8175
used-cores: 1
total-cores: 26
tdp: 165
energy: 0.004 # <-- output
```

The energy used by this instance for 5s at 33% utilization is about 0.004 Wh.

## Use Cases

There are many advantages of composing a model pipeline from lots of smaller model plugins.
### Re-usability

Why implement some functionality in your model plugin if another is available that already implements the functionality you need, is widespread, and is well-supported?

For example, several existing models (Boavizta, Cloud Carbon Footprint, and Climatiq, to name a few) all implement the same functionality to map cloud instance types, like `m5d.large`, to physical processors like `Intel Xeon Platinum 8175`. Internally, they are all doing the same thing, looking up what `m5d.large` is and what CPU microarchitecture it's linked to. However, the methodology they each use, the accuracy and recency of the data in the lookup tables, and how they fallback when no data is available differs

If a single model plugin existed, and all it did was map cloud instance types to microarchitectures and did that one job very well. Then other models may choose to rely on that one as a pre-requisite instead of implementing the functionality themselves:

```yaml
pipeline:
- instance-metadata
- ccf
```

> [!important] Ecosystem of plugins
> The model plugins are the bulk of functionality for the impact engine. A small set of tightly focused model plugins can be combined in multiple ways to meet many use cases.

### Upgradability

Any time you upgrade one model plugin, every other plugin that depends on it can take advantage of those changes.

Sticking to the above example of the `instance-metadata` plugin. A new version of the `instance-metadata` plugin can be released if a cloud provider updates its instance types or offerings. If you update your install, every plugin that depends on that data can take advantage of the update.
### Consistency

Again, sticking to the `instance-metadata` plugin example. Breaking out functionality into lots of smaller plugins allows for more consistency. This is especially true regarding plugins that do lookups against data.

Rather than each plugin determining its own meta-data, one plugin can provide the metadata required for several subsequent plugins. From the data it exports to the observations, we can see what data every subsequent plugin is using.
### Debuggability

Since each model outputs a copy of its inputs, we can easily debug a calculation through its chain by dumping out the intermediate observations.
### Simulation

You can create models that **simulate** making changes to the system. For example, you could create a model called `change-instance-type`, which adjusts the data being passed through to **simulate** as if it was run on another cloud instance.

Through a set of simulation plugins, you can investigate **what if scenarios**, see the impact of work before investing any time, like so

```yaml
component:
pipeline:
- change-instance-type
- instance-metadata
- tdp
- teads
config:
change-instance-type:
to-instance: m5d.xlarge
observations:
- timestamp: 2023-07-06T00:00
duration: 5
cpu: 33
vendor: aws
instance_type: m5d.large
```

`change-instance-type` model would convert the above to this:

```yaml
component:
pipeline:
- change-instance-type
- instance-metadata
- tdp
- teads
config:
change-instance-type:
to-instance: m5d.xlarge
observations:
- timestamp: 2023-07-06T00:00
duration: 5
cpu: 17.5 # <-- updated
vendor: aws
instance_type: m5d.xlarge # <-- updated
```

`m5d.xlarge` is the same CPU but twice the size of `m5d.large`, so this plugin halves the utilization to mirror what might be on the new instance type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the utilization is an observation. It needs to come from the software only. What we could do is by changing the instance type using this simulation the impact model can output sim-m which is the embodied emissions change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still lots we need to think through around this, eventually I think there will form a dictionary of model terms.

Extract Model
Simulation Model
Adapter Model

I don't think there is any need to figure this out now but over time I think we'll evolve a language around this but it will be determined by the types of models that will be created.

RE: The confusion regarding impacts as a term for something that doesn't match our original defn of impacts. I agree but I'm not confident we have enough information yet to decide on an alternative approach? I suspect once we create more models and IMPLs then even more nuance might appear and an alternative language might surface.

Shall we aim to review this language around the decarb event and maybe our alpha release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this discussion to an issue since I think it's not blocking the merging of this spec. #149


The rest of the pipeline would then be the same. No other plugin would need to be changed; they would all function as before.

### Modelling Managed Services

We need to be able to measure the energy consumed by a processor since *eventually* everything is executed on a processor. However, these days most services are consumed through higher-level managed services (for example, AWS Lambda). In these managed services, you are abstracted away from the underlying processor, utilization, and instances. How do we measure the impacts of these managed services?

We first have to start with observations. If we take AWS Lambda (or any FaaS), then the observations we might receive are along the time/space dimension, like so:

```yaml
observations:
- timestamp: 2023-07-06T00:00
duration: 5
gb-s: 1005
```

Most cloud FaaS measure by gigabyte seconds. So, the number of seconds your function runs for multiplied by the GB of memory it used in the same period. You don't know the underlying instance or the utilization.

There are no models *currently* that translate GB-s to energy and embodied carbon. However, by chaining several models together into a pipeline, we can *translate* GB-s to some equivalent utilization on an instance type and then compute using a similar pipeline to what we've used before.

Imagine we had an adaptor model called `aws-lambda-to-instance`, which transformed `gb-s` into an observation that can be computed using an existing set of models, like so:

```yaml
component:
pipeline:
- aws-lambda-to-instance
- instance-metadata
- tdp
- teads
observations:
- timestamp: 2023-07-06T00:00
duration: 5
gb-s: 1005
```

`aws-lambda-to-instance` might first transform the observation to:

```yaml
component:
pipeline:
- aws-lambda-to-instance
- instance-metadata
- tdp
- teads
observations:
- timestamp: 2023-07-06T00:00
duration: 5
gb-s: 1005
vendor: aws # <-- new
instance_type: m5d.large # <-- new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this declaration is with underscores.

cpu: 33 # <-- new
```

The observation is now in a format that can be computed using the rest of the pipeline.

Using **Managed Services Adaptor Models** (MSAM), we can quickly model higher-level managed services.

Another future is one where a model is created that directly translates `gb-s` to `energy`, perhaps maintained and released by AWS themselves, like so:

```yaml
component:
pipeline:
- aws-lambda
observations:
- timestamp: 2023-07-06T00:00
duration: 5
gb-s: 1005
```

But with model pipelines we don't need to wait and in the meantime can fill in the gaps with adapter models,