-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time series arithmetic between the different elements #994
Comments
@jawache please review this solution |
@narekhovhannisyan and @jmcook1186, so for now we can live with multiple API calls and lots of human massaging of data. Historically we've discussed this in much much earlier versions of IF and the solution we landed on was to implement some internal caching feature. So end users configure the plugins as they want, and we optimize by caching results for plugins and then returning the same results "for the same query", that kind of approach still needs a lot of refinement since it's not actually that straight forward. However I think it's a bit premature to optimize in this way, the impact is low (a few repeated api calls and/or copy/paste of data) and the proposed solution I think can have a lot of unintended consequences, global data, automatic-copying of sub-trees which we'll be stuck with for a long time. |
@narekhovhannisyan I saw this in the text above "We might have to assert that --observe plugins across the whole tree are executed before any --compute plugins are executed, otherwise we have ordering requirements for certain compute plugins (e.g. we could try to execute a sci that relies on some functional unit in another component where those values haven't been imported yet)." Can you confirm whether we are running I had assumed with our new pure functional architecture these would have been written as there own pure functions? |
@jawache Yeah |
Ok, accept the feedback on cross-component interactions for now. Let me just quickly add some colour to the response re The For example,. it's not uncommon to need to chain input plugins together. Maybe we want to look up a processor name and then look up its TDP using two external API calls - unless the two APIs have identical naming systems we need to insert the This is why it can be harder than it seems to separate out the |
@jmcook1186 I agree, it's a fluffy decision whether something is an observe plugin or compute. The original intention of breaking up the pipeline was to make the decision about where to put the TimeSync and Grouping plugins more obvious, they were supposed to both be baked into the regroup step. Another reason was for verification, plugins that require non-public data, API keys, logins etc... can go in observe so someone "verifying" can just rerun compute and not require all the same permissions, keys, etc... But it's making less and less sense, for instance WattTime plugin still only would work after the grouping but given the above we would want it to fit in observe. It's getting a code smell to me, perhaps we need to revisit. My gut is telling me there is a super elegant and simple solution to do with an alternative view on time (global time window, fixed duration, dunno?) and atomic observations (treating each observation independently of the time series it's part of negates the need to group to get a unique time series). It's on the tip of my tongue but just out of reach. |
Why: Sub of #949 . In order to create a realistic manifest file for the GSF website
What: We need the ability to carry out simple arithmetic between the different elements
Context
Let's say we have a component containing a time series for number of page views per hour, which may have been populated using an importer plugin for e.g. google analytics.
We also have a separate component, e.g.
web-server
that has impacts for energy and carbon in the same time intervals as the page visits.Now we want to calculate our SCI score by dividing
carbon
in each observation in theweb-server
component by the page views in thepage-views
component - we can't because all the information we need to process an observation and create a new output value has to exist within the same component as that observation.This is problematic because it suggests we have to either know the page views in advance and manually add them everywhere we need them across our manifest, or we have to run some importer plugin for every component in the tree that wants to access that data, leading to a lot of repetition, points of failure and unnecessary carbon expenditure.
What this amounts to is that today, unless we want to make manual interventions to the manifest, we cannot use time series data for our functional unit in SCI calculations.
Here's what we want to be able to do:
We might have to assert that
--observe
plugins across the whole tree are executed before any--compute
plugins are executed, otherwise we have ordering requirements for certain compute plugins (e.g. we could try to execute a sci that relies on some functional unit in another component where those values haven't been imported yet).note Why not just use the importer inside each component and add the page-visits to each observation?
A few reasons - first is that it's a wasteful way to get the data, it would require an external API call per component for data we already have, which is time, energy and carbon inefficient. Also, it's plausible the response could change from one component to another. It also requires that the data arriving from the importer is already sync'd with the existing set of timestamps, which it may or may not be - this would be tricky to handle internally. These are the reasons i think separate components plus cross-component operations are the way to go.
*Narek's implementation notes
To let the framework know that we will want to reuse the observed value in other child components, we have to pass store-result: true flag to the plugin config in initialize section like this:
In the pipeline user can mention name of the plugin and the components name to reuse it’s data:
** Note from @jmcook1186**: I prefer something like
global: true
compared tostore-result: true
. Then we can invoke usingglobal: page-views
rather than using the original component name.Meanwhile the framework will check, if the name in the compute section is present in the plugins storage, then it will execute from scratch, otherwise framework will check results storage to see if there is any data saved by previous child component.
Scope of work:
Acceptance Criteria
Scenario 1
GIVEN the cross-component operations are working
WHEN I run the following manifest:
I get the following output:
The text was updated successfully, but these errors were encountered: