Build the input workflow #289

clizbe · 2023-11-22T10:57:58Z

Build the basic input workflow from raw data to the model.

See discussion #288

Considerations

Meta considerations

Do we need parallel execution of pipelines?
Maybe supporting parallel jobs with shared inputs is sufficient

Capabilities/Usability requirements

able to visualise and inspect input/intermediate datasets (migrated to Desired data transformations in analysis workflow TulipaIO.jl#11)
Scenario building (migrated to Define scenario specification workflow #414)
- change (enable/disable/limit) capacities
- easily specify scenario parameters for multiple scenarios
  - need examples
  - probably needs some kind of filter-and-apply API
  - should be code, not config
run model unattended on a server/cluster (migrated to Future possible extensions (new features) #620)
ability to compare & inspect multiple runs (e.g. different scenarios) (migrated to Desired data transformations in analysis workflow TulipaIO.jl#11) and Start the Notebook implementation TulipaPlots.jl#36)
Related Issues
Map ESDL to a Julia data structure (assets data, flows data, and graph) #89

WHAT WE WANT
Build the network once (in a while)
Use draft networks to build new networks
Sufficient flexibility for ad-hoc code for experimentation
Definition of temporal stuff
Definition of scenarios (what is included here?)
Scope: just model or parts of pipeline (which parts?)
Definition of solver specifications
Be able to mix data sources (ESDL + ENTSO-E for example)
Self-hosted Tulipa database (in case sources change/vanish, & reduce re-pulling/processing data)
Export ESDL to simplified representation that is compatible with Tulipa

abelsiqueira · 2023-11-23T09:01:51Z

Does this includes the representative periods and the assets and flows partitions, or is it just for the data sources?

suvayu · 2023-11-23T11:21:44Z

The representative period comes from an algorithm, so that should be included, but optionally. A scenario might not require the algorithm, and use fixed periods instead, or the case where the algorithm has run once, and the input hasn't changed, then it need not run again.

As for the flow partitions, aren't they derivable from the profiles? If so, then that would also be along the lines of "compute if input changes".

clizbe · 2023-11-23T12:27:25Z

@Lokkij I tagged you on this one too if you're interested. You're of course our source for ESDL knowledge but I thought you might also be interested in this stuff. :)

suvayu · 2023-11-23T15:28:47Z

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

@clizbe I'm guessing you left that comment? Best to discuss in the thread instead of the editing top post.

I see that my wording is pretty unclear. AFAICT, there are two levels of filtering; the top-level includes stuff that are not in Tulipa because of fundamental modelling choices, e.g. no connections. So maybe then having the Port attributes in ESDL will never make sense. And the next level is any other finer choices that we make, which evolves with time.

In this case I mean the top-level fundamental choices. But maybe I'm over thinking it, and doing everything in one go is simpler.

clizbe · 2023-11-28T08:18:11Z

Yes I think some of it will be specifying the type of ESDL file that Tulipa accepts - which variables should be filled, etc. And then probably a step of converting that ESDL into the form that Tulipa likes, which will include throwing out anything else and maybe some conversion trickery. I would prefer if the ESDL file looks normal before conversion and that we don't build really weird ESDLs - but we'll see what works.

Lokkij · 2023-11-28T09:48:57Z

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

Usually the approach here is to leave attributes in ESDL and simply not read them from the model if you don't need them. In our case, I would keep the filtering as close to Tulipa as possible. That will likely make it easier to write back results to ESDL while keeping the original attributes intact.

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

suvayu · 2023-11-28T09:56:44Z

On Tue, 28 Nov, 2023, 10:49 Wester Coenraads, ***@***.***> wrote: Do we need a local data store? What would the local data store be used for? To store temporary in-between data, or something else?

As my understanding goes, for larger datasets we will have to connect to influxdb (or similar) and download for Tulipa to read. There will also be intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

…

-- Suvayu

clizbe · 2023-11-28T14:37:09Z

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

Lokkij · 2023-11-28T16:05:54Z

As my understanding goes, for larger datasets we will have to connect to
influxdb (or similar) and download for Tulipa to read. There will also be
intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

To me this looks like a class diagram, very similar to the diagrams for ESDL. The ESDL documentation has diagrams for all classes, for example: https://energytransition.github.io/#router/doc-content/687474703a2f2f7777772e746e6f2e6e6c2f6573646c/PowerPlant.html

clizbe · 2023-11-29T13:11:04Z

@datejada @gnawin @clizbe
Add some use-cases of how you're going to use the model and what your workflow is so they have a better idea of what we need.
"I want to run the model from the train" is valid. :)

clizbe · 2023-11-30T16:49:44Z

Use Cases
I would like to be able to:

summarize/visualize my input data (in tables or graphs), such as total wind capacity, transport line capacities, available technologies.
make transport capacities in certain areas unlimited, while still constraining others.
set up multiple scenarios to run in parallel or (otherwise) series - set and forget.
visualize output data from one scenario, as well as compare multiple scenarios.
keep track of what model version and what data was used for a particular run/analysis - reproducibility.
easily specify scenario parameters for multiple scenarios.
occasionally add new data / data sources.
specify which data sources to use to build a scenario.
run the model somewhere that I can go about other work while it runs.
know when the model is finished running.

My current workflow for running scenarios is:

Duplicate a "default" Access dataset - this has everything needed to do a run.
In Excel, process scenario-unique (new) data, so it works with the model.
In Access, filter for and delete any data that will be replaced by the new data.
Copy and paste the new data into the dataset.
Go into the model, Browse for the dataset, Load it, Run the model.
Check frequently if the model has finished running.
Export data to Excel to make graphs (although Wester is building a UI to make this nicer).

Pros/Cons of Access

Can easily see data (once you know where it is)
Easy to learn how to edit
Takes a long time to edit
Sometimes you don't know where the data is
Huge tables make it slow even loading/filtering

suvayu · 2023-12-01T11:22:54Z

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

I guess that's pretty small. However I would really like to support a workflow that doesn't necessitate you to be online. But if people say there's no such need, we can drop it.

Edit: more I think about it, I think we need it, e.g. for running different scenarios it makes no sense to download the same data repeatedly even if it is small. So the question is, should the local store also be accessible to normal users for inspection and analysis. And based on @clizbe's points, I think it should be.

suvayu · 2023-12-01T11:32:16Z

Pros/Cons of Access

Can easily see data (once you know where it is)

Easy to learn how to edit

Takes a long time to edit

Sometimes you don't know where the data is

Huge tables make it slow even loading/filtering

@clizbe Do you know SQL? Is it fair to expect someone who is doing analysis to know/learn a bit of SQL?

clizbe · 2024-01-15T16:27:26Z

@suvayu Sorry I don't know if I responded in person.
Learning SQL is totally feasible. I don't think our current modellers know it. (I've used it once.)

clizbe · 2024-02-06T13:20:03Z

Compiling the model takes a lot of time (Julia thing) with future runs going faster. How are we dealing with this in the workflow? Is the stable version of Tulipa something that compiles once and then can take any data through it? Or will the scenario define a model that needs precompiling before doing multiple runs?

suvayu · 2024-02-06T17:29:25Z

I think this request needs to be separated according to use case. For example, if you changed an input dataset, naively, you have to rerun. However if you say "I'm doing a sensitivity study, and my changes are only limited to X" then theoretically the repetitions need not start from scratch. But I think that's a very advanced feature which requires deep technical research. AFAIU, this is in @g-moralesespana and @datejada's wishlist (GUSS in GAMS). But there could be simpler use cases between these two extremes.

That said, I'm not sure whether this would fall under the purview or pipeline/workflow or model building. My hunch is, it'll depend on the use case.

I hope that makes sense :⁠-⁠P

clizbe · 2024-02-07T11:01:53Z

Yeah I figured I'd comment here in case it's a simple answer, but it's probably a bigger discussion.

This is becoming an issue with Spine, so it's good to think about it early.

datejada · 2024-04-28T12:24:26Z

For the ENTSOE data base I found this, but I'm not sure if we have access (or if we could have)...it might be interesting to explore it...

https://www.linkedin.com/posts/activity-7140005469414133760-f4XH/?utm_source=share&utm_medium=member_desktop

datejada · 2024-04-29T09:03:12Z

@nope82 commented the following about ENTSO-E:

From just a quick check it seems that this PEMMDB is only accessed by TSOs (Author’s comment: “Sadly no, (data transparency) it is only for sharing between TSO members”. When looking for access to the data, only found a reglament from the EERA study from ACER asking for the PEMMDB data :

“On 23 November 2021, ACER requested ENTSO-E to provide all input data for the ERAA 2021. On 2 December 2021, ENTSO-E provided ACER with access to the pan-European market modelling database (PEMMDB) and the assumptions for the economic viability assessment (EVA)”.

So it seems that ENTSO-E would be the only one that could give access to it, and also seems to be one-time thing access for specific data (or need to do recurrent request access) instead of a completely open access to the data probably

clizbe · 2024-07-29T15:43:59Z

@clizbe Reorganize the info here and close this issue

clizbe · 2024-09-19T14:00:52Z

Stale issue - ongoing efforts moved to other places (links provided)

clizbe added the Type: epic Epic issues (collection of smaller tasks towards a goal) label Nov 22, 2023

clizbe assigned suvayu Nov 22, 2023

suvayu added Zone: pre-processing Zone: data & import Interfacing between database and Julia labels Nov 22, 2023

clizbe assigned Lokkij Nov 23, 2023

clizbe mentioned this issue Nov 23, 2023

ESDL to CSV #88

Closed

8 tasks

clizbe removed Zone: pre-processing Zone: data & import Interfacing between database and Julia labels Nov 23, 2023

clizbe added this to the M1 - End March milestone Dec 11, 2023

clizbe mentioned this issue Dec 13, 2023

ESDL to JSON to Julia graph structure TulipaEnergy/TulipaIO.jl#1

Closed

7 tasks

abelsiqueira mentioned this issue Jan 15, 2024

Check if Lazy Loading is possible #28

Closed

datejada mentioned this issue Feb 5, 2024

Create a function to export the whole solution to file (e.g., CSV, parquet, JSON, etc.) #295

Closed

abelsiqueira mentioned this issue Mar 18, 2024

Refactor architecture to use Tables (DuckDB) #547

Closed

9 tasks

datejada mentioned this issue Mar 18, 2024

Define scenario specification workflow #414

Open

3 tasks

clizbe mentioned this issue Sep 19, 2024

Creat How-To for adding options for specific solvers to specification file/method #809

Open

clizbe closed this as completed Sep 19, 2024

clizbe mentioned this issue Sep 19, 2024

Start the Notebook implementation TulipaEnergy/TulipaPlots.jl#36

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build the input workflow #289

Build the input workflow #289

clizbe commented Nov 22, 2023 •

edited

Loading

abelsiqueira commented Nov 23, 2023

suvayu commented Nov 23, 2023 •

edited

Loading

clizbe commented Nov 23, 2023

suvayu commented Nov 23, 2023 •

edited

Loading

clizbe commented Nov 28, 2023

Lokkij commented Nov 28, 2023

suvayu commented Nov 28, 2023 via email

clizbe commented Nov 28, 2023

Lokkij commented Nov 28, 2023

clizbe commented Nov 29, 2023

clizbe commented Nov 30, 2023 •

edited

Loading

suvayu commented Dec 1, 2023 •

edited

Loading

suvayu commented Dec 1, 2023

clizbe commented Jan 15, 2024

clizbe commented Feb 6, 2024

suvayu commented Feb 6, 2024

clizbe commented Feb 7, 2024 •

edited

Loading

datejada commented Apr 28, 2024

datejada commented Apr 29, 2024

clizbe commented Jul 29, 2024

clizbe commented Sep 19, 2024 •

edited

Loading

Build the input workflow #289

Build the input workflow #289

Comments

clizbe commented Nov 22, 2023 • edited Loading

Considerations

Meta considerations

Capabilities/Usability requirements

abelsiqueira commented Nov 23, 2023

suvayu commented Nov 23, 2023 • edited Loading

clizbe commented Nov 23, 2023

suvayu commented Nov 23, 2023 • edited Loading

clizbe commented Nov 28, 2023

Lokkij commented Nov 28, 2023

suvayu commented Nov 28, 2023 via email

clizbe commented Nov 28, 2023

Lokkij commented Nov 28, 2023

clizbe commented Nov 29, 2023

clizbe commented Nov 30, 2023 • edited Loading

suvayu commented Dec 1, 2023 • edited Loading

suvayu commented Dec 1, 2023

clizbe commented Jan 15, 2024

clizbe commented Feb 6, 2024

suvayu commented Feb 6, 2024

clizbe commented Feb 7, 2024 • edited Loading

datejada commented Apr 28, 2024

datejada commented Apr 29, 2024

clizbe commented Jul 29, 2024

clizbe commented Sep 19, 2024 • edited Loading

clizbe commented Nov 22, 2023 •

edited

Loading

suvayu commented Nov 23, 2023 •

edited

Loading

suvayu commented Nov 23, 2023 •

edited

Loading

clizbe commented Nov 30, 2023 •

edited

Loading

suvayu commented Dec 1, 2023 •

edited

Loading

clizbe commented Feb 7, 2024 •

edited

Loading

clizbe commented Sep 19, 2024 •

edited

Loading