Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated spec with model pipelines #95

Merged
merged 2 commits into from
Sep 13, 2023
Merged

Conversation

jawache
Copy link
Contributor

@jawache jawache commented Aug 25, 2023

Types of changes

  • Enhancement (project structure, spelling, grammar, formatting)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

A description of the changes proposed in the Pull Request

  • Updated the spec to explain model plugin pipelines.

Copy link
Contributor

@jmcook1186 jmcook1186 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder in some cases whether the distinction between chaining models and simply cleaning/preprocessing data from the impl might get a bit blurry? Probably just semantics.

```


- `pipeline` defines the models we apply and the order in which we use them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the time normalization, aggregation etc count as models in this spec, or are they defined separately? In the old spec they were defined in the pipeline field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some things are now just either baked into the framework (e.g. aggregation) and not plugin/configurable. Aggregation now only literally needs to aggregate which will be the same for every use case.

time-normalization can be done in the models field, but it does mean that a model needs to be initialised differently (time-normalization model needs to know information about the whole graph so it can figure out the time buckets, but once it knows that it can just be used as any other model)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time-normalization can I realize be implemented as model like everything else (needs slight tweaking to the model initialization rules but it can work). Aggregation also now is just a normal standard aggregation and can just be baked into the framework.

docs/Model Pipeline.md Outdated Show resolved Hide resolved
@jawache
Copy link
Contributor Author

jawache commented Aug 30, 2023

LGTM. I wonder in some cases whether the distinction between chaining models and simply cleaning/preprocessing data from the impl might get a bit blurry? Probably just semantics.

That's a good point. I've been calling everything a model. I think it's worthwhile still calling them models but having the understanding some will be simple processors?

@jawache jawache mentioned this pull request Aug 31, 2023
9 tasks
vendor: aws
instance-type: m5d.large
duration: 5
cpu-util: 33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having hyphens in middle causes the parsers to behave differently and especially in javascript, the object access syntax we use will be void and we need to use the array syntax.

duration: 5
gb-s: 1005
vendor: aws # <-- new
instance_type: m5d.large # <-- new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this declaration is with underscores.

@srini1978
Copy link
Contributor

LGTM. I wonder in some cases whether the distinction between chaining models and simply cleaning/preprocessing data from the impl might get a bit blurry? Probably just semantics.

That's a good point. I've been calling everything a model. I think it's worthwhile still calling them models but having the understanding some will be simple processors?

The distinction between the 2 is : there is a "component" which enriches the observation(s) and passes it on to the next model in the pipeline and the second is a component which actually calculates an Impact metric. In the true sense if we want to brand impact plugin as a software component that calculates impact metrices only then do we want to call these as ModelData Enrichment components. Here I am referring to our Observation as Model Data.

instance_type: m5d.xlarge # <-- updated
```

`m5d.xlarge` is the same CPU but twice the size of `m5d.large`, so this plugin halves the utilization to mirror what might be on the new instance type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the utilization is an observation. It needs to come from the software only. What we could do is by changing the instance type using this simulation the impact model can output sim-m which is the embodied emissions change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still lots we need to think through around this, eventually I think there will form a dictionary of model terms.

Extract Model
Simulation Model
Adapter Model

I don't think there is any need to figure this out now but over time I think we'll evolve a language around this but it will be determined by the types of models that will be created.

RE: The confusion regarding impacts as a term for something that doesn't match our original defn of impacts. I agree but I'm not confident we have enough information yet to decide on an alternative approach? I suspect once we create more models and IMPLs then even more nuance might appear and an alternative language might surface.

Shall we aim to review this language around the decarb event and maybe our alpha release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this discussion to an issue since I think it's not blocking the merging of this spec. #149

@jawache jawache mentioned this pull request Sep 7, 2023
10 tasks
@jawache
Copy link
Contributor Author

jawache commented Sep 7, 2023

To review @srini1978 comments offline and confirm if there are any updates/objections before merging.

@jawache jawache merged commit 7957df2 into dev Sep 13, 2023
@jawache jawache deleted the updated-spec-with-model-pipelines branch September 13, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants