Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different aggregation methods for time and component aggregation #991

Closed
1 of 5 tasks
Tracked by #949
jmcook1186 opened this issue Aug 26, 2024 · 5 comments · Fixed by #1008
Closed
1 of 5 tasks
Tracked by #949

Allow different aggregation methods for time and component aggregation #991

jmcook1186 opened this issue Aug 26, 2024 · 5 comments · Fixed by #1008
Assignees
Labels
core-only This issue is reserved for the IF core team only
Milestone

Comments

@jmcook1186
Copy link
Contributor

jmcook1186 commented Aug 26, 2024

What

Enable aggregation to use a different method (sum, avg, copy, none) for "horizontal" (time) and "vertical" (component) aggregation for a single parameter. Also rename horizontal and vertical to time and component aggregation respectively.

Why
There are cases where we need to average across time series but sum across components, and vice-versa. A critical example is the SCI score - we have to be able to take an average of many snapshots of SCI taken within a single time series, but then sum across the components in a tree to give an overall SCI value.

Context

The SCI value is a rate.

If we have a functional unit of e.g. visits, and we have values for that per timestep, then we can gather an SCI score per timestep by doing carbon/visits. This is what our SCI plugin does in each timestep, in each component.
However, now let’s say we want to do this over three components.

We have per-timestep SCI in units of gCO2e/visit in each of three components to aggregate up to a single value.
We don’t want to sum across time, because what we end up with is not SCI - we’ll end up with an inflated rate that doesn’t represent the actual rate at any point during our times series, but instead a spuriously high one.

E.g. if you did 60 mph for an hour, you would cover 60 miles and your average speed would be 60 mph and your max speed would also be 60 mph, but if we added up the speed of your car measured every minute for an hour long journey, we’d end up saying you went 3600 mph. We're effectively doing this with SCI.

So instead, we actually want to set the aggregation method to avg, or we want to add a normalization step where we do generate a time-totalled SCI by setting the aggregation method to sum but then we divide by number of time-steps retroactively (which ends up being the exact same thing).

No problem, then, for calculating the average SCI per component, but now we want to aggregate across components. Now we really DO want to sum the SCI values together to yield one overarching value for the whole tree, but oh dear we already set our aggregation method to avg. So we can only get a spuriously LOW estimate in the top level aggregation because we are forced to average where we want to sum.

So, because we have a single aggregation method that covers both time and component aggregation, and we can’t do operations over values after they are aggregated - we can’t calculate SCI in a multi-component manifest inside IF.

To resolve this, we need to be able to configure the aggregation method for vertical/component and horizontal/time aggregation independently.

While we are here, we should also rename horizontal aggregation --> time aggregation, and vertical aggregation --> component aggregation, so that they are unambiguous.

I propose the parameter-metadata config is updated in all plugin source code the parameter-metadata type definition that allows for metadata overwriting, so that aggregation-method is an object with two fields: time and component which accept sum, avg, none or copy enum variants.

In plugin source code:

export const Sci = (
  config: ConfigParams,
  parametersMetadata: PluginParametersMetadata,
  mapping: MappingParams
): ExecutePlugin => {
  const metadata = {
    kind: 'execute',
    inputs: {
      ...({
        carbon: {
          description: 'an amount of carbon emitted into the atmosphere',
          unit: 'gCO2e',
          'aggregation-method':
            time: 'sum',
            component: 'sum'
        },
        'functional-unit': {
          description:
            'the name of the functional unit in which the final SCI value should be expressed, e.g. requests, users',
          unit: 'none',
          'aggregation-method': 
            time: 'sum',
            component: 'sum'
        },
      } as ParameterMetadata),
      ...parametersMetadata?.inputs,
    },
    outputs: parametersMetadata?.outputs || {
      sci: {
        description: 'carbon expressed in terms of the given functional unit',
        unit: 'gCO2e',
          'aggregation-method': 
            time: avg',
            component: 'sum'
      },
    },
  };

In manifest (when setting param metadata)

  sci:
    path: builtin
    method: Sci
    config:
      functional-unit: site-visits
    parameter-metadata:
      outputs:
        sci:
          unit: gCO2 / visit
          description: software carbon intensity
          aggregation-method: 
            time: avg
            component: sum

And the aggregation config should be updated to accept both, time and component, rather than both, horizontal and vertical.

Skipping components

While we are updating the aggregation feature, we should also support skipping named components from the aggregation. This is necessary to enable cross-component arithmetic. For example, imagine we have a component that is used to import page-visit data from an API, and we then want to use that as a functional unit in an SCI calculation across our manifest. If we aggregate using our current feature, we'll throw an exception because one of our components (the one with page-visits) won't have carbon values, so aggregation will fail. We don't want that - we just want to ignore that component in our aggregation.

So ideally we'll have aggregation config that supports skip-components, looking something like:

aggregation:
  metrics:
    - carbon
    - sci
  type: both
  skip-components:
    - page-visits # this maps to a component name

Error out if the names given in the aggregation config do not map to component names in the tree.

Prerequisites/resources
n/a

SoW (scope of work)

  • enable independent aggregation method configuration
  • enable component skipping
  • documentation updated
  • renaming of aggregation types applied across docs
  • test cases updated

Acceptance criteria

GIVEN the changes are implemented

WHEN I run the following manifest:

name: GSF Website SCI
description: Generates SCI score (gCO2eq/visit) for greensoftware.foundation website
tags:
aggregation:
  metrics:
    - carbon
    - sci
  type: both
  
initialize:
  plugins:
    sci:
      kind: plugin
      method: Sci
      path: "builtin"
      config:
        functional-unit: site-visits
      parameter-metadata:
        inputs:
          carbon:
            description: carbon emmitted in gCO2e
            unit: gCO2e
            aggregation-method: 
              time: 'sum'
              component: 'sum'
          site-visits:
            description: times site was visited
            unit: visit
            aggregation-method: 
              time: 'sum'
              component: 'sum'
        outputs:
          sci:
            description: software carbon intensity
            unit: gCO2 / visit
            aggregation-method: 
              time: 'avg'
              component: 'sum'

tree:
  children:
    component-1:
      pipeline:
        compute:
          - sci
      defaults:
      inputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0027
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0027
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0027

    component-2:
      pipeline:
        compute:
          - sci
      defaults:
      inputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0007
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0007
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0007

THEN I get the following result:

name: GSF Website SCI
description: Generates SCI score (gCO2eq/visit) for greensoftware.foundation website
tags:
aggregation:
  metrics:
    - carbon
    - sci
  type: both
  
initialize:
  plugins:
    sci:
      kind: plugin
      method: Sci
      path: "builtin"
      config:
        functional-unit: site-visits
      parameter-metadata:
        inputs:
          carbon:
            description: carbon emmitted in gCO2e
            unit: gCO2e
            aggregation-method: 
              time: 'sum'
              component: 'sum'
          site-visits:
            description: times site was visited
            unit: visit
            aggregation-method: 
              time: 'sum'
              component: 'sum'
        outputs:
          sci:
            description: software carbon intensity
            unit: gCO2 / visit
            aggregation-method: 
              time: 'avg'
              component: 'sum'

tree:
  children:
    component-1:
      pipeline:
        compute:
          - sci
      defaults:
      inputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0027
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0027
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0027
      outputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0007
          sci: 3.070175438596491e-06
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0007
          sci: 3.2407407407407406e-06
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0007
          sci: 3.4482758620689654e-06
      aggregated:
        carbon: 0.0021
        sci:  3.2530640138020657e-06

    component-2:
      pipeline:
        compute:
          - sci
      defaults:
      inputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0007
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0007
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0007
      outputs:
        - timestamp: '2024-07-22T00:00:00'
          duration:	86400	
          site-visits: 228
          carbon: 0.0007
          sci: 3.070175438596491e-06
        - timestamp: '2024-07-23T00:00:00'
          duration:	86400	
          site-visits: 216
          carbon: 0.0007
          sci: 3.2407407407407406e-06
        - timestamp: '2024-07-24T00:00:00'
          duration:	86400	
          site-visits: 203
          carbon: 0.0007
          sci: 3.4482758620689654e-06
      aggregated:
        carbon: 0.0021
        sci:  3.2530640138020657e-06
  outputs:
    - timestamp: '2024-07-22T00:00:00'
      duration:	86400	
      site-visits: 228
      carbon: 0.0014
      sci: 6.140350877192982e-06
    - timestamp: '2024-07-23T00:00:00'
      duration:	86400	
      site-visits: 216
      carbon: 0.0014
      sci: 6.481481481481481e-06
    - timestamp: '2024-07-24T00:00:00'
      duration:	86400	
      site-visits: 203
      carbon: 0.0014
      sci: 6.896551724137931e-06
  aggregated:
    carbon: 0.0028
    sci: 6.506128027604131e-06
@jmcook1186
Copy link
Contributor Author

@narekhovhannisyan please take a look and lmk if this all makes sense

@zanete zanete added this to the IF Watchers milestone Aug 27, 2024
@zanete zanete added the core-only This issue is reserved for the IF core team only label Aug 27, 2024
@narekhovhannisyan
Copy link
Member

@jmcook1186 Seems good to me, moving to in progress

@zanete zanete moved this from In Refinement to In Progress in IF Aug 29, 2024
@zanete
Copy link

zanete commented Aug 29, 2024

if only 0one aggregation method given, then apply to both (shouldn't be a breaking change)
@jmcook1186 - please provide per plugin info on the aggregation methods

@zanete zanete added the blocked The issue is blocked and cannot proceed. label Aug 30, 2024
@zanete zanete moved this from In Progress to Blocked in IF Aug 30, 2024
@zanete zanete moved this from Blocked to In Progress in IF Sep 2, 2024
@zanete zanete removed the blocked The issue is blocked and cannot proceed. label Sep 2, 2024
@jmcook1186
Copy link
Contributor Author

@narekhovhannisyan added detail on component skipping to issue description

@zanete
Copy link

zanete commented Sep 9, 2024

one blocking issue to discuss between @narekhovhannisyan and @jmcook1186 before a PR can be produced.

@narekhovhannisyan narekhovhannisyan linked a pull request Sep 9, 2024 that will close this issue
9 tasks
@zanete zanete assigned jmcook1186 and manushak and unassigned manushak Sep 10, 2024
@github-project-automation github-project-automation bot moved this from Pending Review to Done in IF Sep 10, 2024
@zanete zanete mentioned this issue Sep 16, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-only This issue is reserved for the IF core team only
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants