-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow different aggregation methods for time and component aggregation #991
Comments
@narekhovhannisyan please take a look and lmk if this all makes sense |
@jmcook1186 Seems good to me, moving to in progress |
if only 0one aggregation method given, then apply to both (shouldn't be a breaking change) |
@narekhovhannisyan added detail on component skipping to issue description |
one blocking issue to discuss between @narekhovhannisyan and @jmcook1186 before a PR can be produced. |
What
Enable aggregation to use a different method (sum, avg, copy, none) for "horizontal" (time) and "vertical" (component) aggregation for a single parameter. Also rename horizontal and vertical to time and component aggregation respectively.
Why
There are cases where we need to average across time series but sum across components, and vice-versa. A critical example is the SCI score - we have to be able to take an average of many snapshots of SCI taken within a single time series, but then sum across the components in a tree to give an overall SCI value.
Context
The SCI value is a rate.
If we have a functional unit of e.g. visits, and we have values for that per timestep, then we can gather an SCI score per timestep by doing
carbon/visits
. This is what our SCI plugin does in each timestep, in each component.However, now let’s say we want to do this over three components.
We have per-timestep SCI in units of gCO2e/visit in each of three components to aggregate up to a single value.
We don’t want to sum across time, because what we end up with is not SCI - we’ll end up with an inflated rate that doesn’t represent the actual rate at any point during our times series, but instead a spuriously high one.
E.g. if you did 60 mph for an hour, you would cover 60 miles and your average speed would be 60 mph and your max speed would also be 60 mph, but if we added up the speed of your car measured every minute for an hour long journey, we’d end up saying you went 3600 mph. We're effectively doing this with SCI.
So instead, we actually want to set the aggregation method to
avg
, or we want to add a normalization step where we do generate a time-totalled SCI by setting the aggregation method tosum
but then we divide by number of time-steps retroactively (which ends up being the exact same thing).No problem, then, for calculating the average SCI per component, but now we want to aggregate across components. Now we really DO want to sum the SCI values together to yield one overarching value for the whole tree, but oh dear we already set our aggregation method to
avg
. So we can only get a spuriously LOW estimate in the top level aggregation because we are forced to average where we want to sum.So, because we have a single aggregation method that covers both time and component aggregation, and we can’t do operations over values after they are aggregated - we can’t calculate SCI in a multi-component manifest inside IF.
To resolve this, we need to be able to configure the aggregation method for vertical/component and horizontal/time aggregation independently.
While we are here, we should also rename horizontal aggregation --> time aggregation, and vertical aggregation --> component aggregation, so that they are unambiguous.
I propose the parameter-metadata config is updated in all plugin source code the parameter-metadata type definition that allows for metadata overwriting, so that
aggregation-method
is an object with two fields:time
andcomponent
which acceptsum
,avg
,none
orcopy
enum variants.In plugin source code:
In manifest (when setting param metadata)
And the aggregation config should be updated to accept
both
,time
andcomponent
, rather thanboth
,horizontal
andvertical
.Skipping components
While we are updating the aggregation feature, we should also support skipping named components from the aggregation. This is necessary to enable cross-component arithmetic. For example, imagine we have a component that is used to import
page-visit
data from an API, and we then want to use that as a functional unit in an SCI calculation across our manifest. If we aggregate using our current feature, we'll throw an exception because one of our components (the one withpage-visits
) won't havecarbon
values, so aggregation will fail. We don't want that - we just want to ignore that component in our aggregation.So ideally we'll have aggregation config that supports
skip-components
, looking something like:Error out if the names given in the aggregation config do not map to component names in the tree.
Prerequisites/resources
n/a
SoW (scope of work)
Acceptance criteria
GIVEN the changes are implemented
WHEN I run the following manifest:
THEN I get the following result:
The text was updated successfully, but these errors were encountered: