Fix timeslice-related bugs #518

tsmbland · 2024-10-09T15:20:46Z

This fixes a few subtle bugs which were very difficult to spot, but messing up some of the results. It's all to do with timeslices and broadcasting...

There are many types of variable in MUSE (supply, demand, capacity, costs, technodata variables etc.), some of which may differ between timeslices (e.g. supply) and some of which are constant for the year (e.g. capacity). MUSE uses xarray dataarrays/datasets to represent these, which will either have a timeslice dimension (multi-index) or not depending on whether the data is timesliced.

In a timesliced dataset, values for each timeslice correspond to the time-period of the timeslice. For example, if you had a timesliced supply dataset, the value for each timeslice would correspond to the total supply over the length of the timeslice. If you had a supply dataset without a timeslice dimension, the value would correspond to supply across the year (i.e. the sum across timeslices).

You can use the convert_timeslice function to extend a non-timesliced dataset over the timeslice dimension. There are a couple of ways you can do this, depending on the variable (although see #516). For example, if you had supply data for the year, you could split this up so that each timeslice has a proportional supply value according to its length, so that the sum across timeslices is equal to the original yearly value. Conversely, if you had prices data, you'd want to copy/broadcast your yearly price across all timeslices rather than splitting it up.

That's pretty much the background, which is all straightforward enough. However, in practice, there are lots of subtleties, and it's very easy to make mistakes when dealing with a mix of timesliced and non-timesliced objects. This PR addresses three such mistakes, which seem to be responsible for the issue described in #512

The first is to do with the utilization factor. The UF can either be specified by the user for the year as a whole or separately for each timeslice, and the internal dataarray will either have a timeslice dimension or not depending on how the UF is specified in the input data. Unfortunately, this means you have to be very careful with some of the maths. Let's consider a simple example with four timeslices, an output rate of 1/year and a utilization factor of 1 in all timeslices. We want to calculate the maximum output in each timeslice, which is simply the output rate for the timeslice multiplied by the utilization factor for the timeslice. We can use convert_timeslice to split the yearly output rate over the four timeslices, but then how should we deal with the UF?

This is how it's currently coded:
max_production = convert_timeslice(fixed_outputs * utilization_factor)

But this means that you end up with a different answer depending on whether the utilization factor is specified for the year as a whole or for each timeslice (even if each timeslice has the same UF):

With a single utilization factor:

convert_timeslice(1 * 1)
-> convert_timeslice(1)
-> [0.25, 0.25, 0.25, 0.25]

With timeslice-level utilization factor:

convert_timeslice(1 * [1, 1, 1, 1])
-> convert_timeslice([1, 1, 1, 1])
-> [1, 1, 1, 1]

However, if we change this to the following it becomes more consistent
max_production = convert_timeslice(fixed_outputs) * utilization_factor

With a single utilization factor:

convert_timeslice(1) * 1
-> [0.25, 0.25, 0.25, 0.25] * 1
-> [0.25, 0.25, 0.25, 0.25]

With timeslice-level utilization factor:

convert_timeslice(1) * [1, 1, 1, 1]
-> [0.25, 0.25, 0.25, 0.25] *  [1, 1, 1, 1]
-> [0.25, 0.25, 0.25, 0.25]

Very subtle, and would never be spotted without digging into the code in debug mode, but this can make a big difference to the results.

The second is to do with the maximum_production function, which would sometimes return timesliced data and sometimes aggregated yearly data depending on the context in which it was called. Again, you have to be very careful with the maths when using this data.

Let's say you're subtracting production from demand to calculate excess demand. If your demand data is timesliced and your production data isn't, then the subtraction operation will broadcast your production data across the timeslices before subtracting from demand. This means you end up subtracting the full yearly production from each timesliced demand figure, which obviously isn't appropriate. You could use convert_timeslice on the production data beforehand, but I think the better approach is the ensure that max_production always returns timesliced data, which I've done here. The net result of this was to fix an issue with the _inner_split function like what I've just described (which seems to manifest only when not using retrofit agents). Again, near impossible to spot without some serious debugging.

The third issue is to do with the ordering of timeslices. When the technodata_timeslices csv is loaded (contains timeslice-level UF and MSF data), the timeslices are sorted into alphabetical order in the resulting dataset, whereas all other timesliced objects match the order specified in the settings file. This isn't a problem when you're performing operations on the xarray object like above, as xarray will use timeslice names to match datasets. But this may be a problem when preparing numpy arrays for the solver, as the ordering in these arrays will match the ordering of the xarray object it's derived from.

I've adjusted the read_technodata_timeslices function to sort the object when it's read, which should clear up any inconsistencies.

Overall, the issue described in #512 appears to be fixed.

However, given how complicated and difficult to spot these bugs were, I'm not completely convinced that I've fixed everything. I think we need a more radical approach to guarantee that these kinds of errors can never exist.

A good start would be to decide, for each variable, whether it should exist in timesliced form or yearly form, and make sure this is consistent everywhere in the code regardless of the input data. For example:

supply/demand: should always be timesliced
capacity: should always be yearly
commodity prices: should always be timesliced
technology variables apart from UF/MSF: should always be yearly
UF/MSF: should always be timesliced

I'm not really sure how best to do this, apart from scattering a load of assert statements throughout the code which could be a bit messy.

The codebase is also far looser than it needs to be in terms of potentially allowing different objects to have different timeslicing schemes. I don't think there will ever be a need for this (except maybe the legacy sectors), so I think we should make use of the global TIMESLICE object (which is set according to what's specified in the settings file), and make sure that all timesliced objects match this scheme. This is the basis of #519, although that's still a work in progress.

I'd be a lot happier if automatic broadcasting across the timeslice dimension was banned, because this can lead to some very hard to spot bugs. But I'm not really sure how we can do that.

The results have changed for all models, but the changes are mostly very small. I have also modified the min/max timeslice tutorial to make it clearer what the expected results are supposed to be

You can see the tutorial notebooks here

Closes #512

…dellingLab/MUSE_OS into fix_supply_issue

tsmbland · 2024-10-18T15:58:08Z

@dalonsoa Any ideas how to make the code more robust so that this sort of thing can never happen again? It's really hard to spot these kinds of bugs just by looking at the code, and obviously the tests good enough to catch them...

dalonsoa

These changes all look sensible to me, therefore approving. But, as you said, there might be plenty of other places where the use of timeslices and broadcasting is inconsistent - or plainly wrong - and very difficult to catch.

There was an attempt to add a arithmetic_broadcast=False global flag to prevent automatic broasdcasting. Unfortunately, there were some issues related to dask (odd issues) and the change was reverted.

I'm not sure how good practice this would be or if it would work, but you could implement the changes in the above PR - some of them - as a patch. Something along the lines of

# In muse/__main__.py

def patched_broadcast_compat_data(self, other):
    from xarray.core.variable import Variable

    if (isinstance(other, Variable) and self.dims != other.dims) or (
            is_duck_array(other) and self.ndim != other.ndim
        ):
            raise ValueError(
                "Broadcasting is necessary but automatic broadcasting is disabled globaly."
            )

    if all(hasattr(other, attr) for attr in ["dims", "data", "shape", "encoding"]):
        # `other` satisfies the necessary Variable API for broadcast_variables
        new_self, new_other = _broadcast_compat_variables(self, other)
        self_data = new_self.data
        other_data = new_other.data
        dims = new_self.dims
    else:
        # rely on numpy broadcasting rules
        self_data = self.data
        other_data = other
        dims = self.dims
    return self_data, other_data, dims

...

if "__main__" == __name__:
    from unittest.mock import patch
    with patch("xarray.core.variable._broadcast_compat_data", patched_broadcast_compat_data):
        run()

Assuming it works, it won't affect the tests, but it should pick any attempt to do automatic broadcasting when actually running a model, which is what we want, in the end.

As I said, I've no idea if this would work, but it should be possible to do something along these lines.

Also, obviously, include the explanation you give in the PR in a developers documentation section.

dalonsoa · 2024-10-22T12:13:36Z

src/muse/outputs/mca.py

+                capacity
+                * convert_timeslice(
+                    techs.fixed_outputs,
+                    demand.timeslice,


I think this is what confuses me the most about timeslices, specially now that you have shed some light into it: when do you use the timeslice of one particular array, like here, and when the global TIMESLICE? If they are the same I'd use always the global one, for clarity. And if they are not... why they are not and how can we know?

I'm pretty sure these are all equivalent to TIMESLICE (or at least they should be), so there's no reason not to use the global. This is part of what I'm doing in #519, although that PR has become a bit too big so I might try to break this specific change into its own PR

Actually it's more complicated than this. TIMESLICE is the global timeslicing scheme from the timeslices section of the settings file. However, MUSE does allow you to have different timeslicing schemes for different sectors (see here). I imagine this is so you can have less granularity in some sectors. For example, in the oil sector, you may only care about meeting demands at the seasonal level, so your timeslices might be "winter", "summer" etc. rather than "winter.weekend.morming", "winter.weekday.night" etc. In this case, maybe the timeslicing of the array isn't going to match up with TIMESLICE. Not sure whether it actually works like that though...

In that case, #519 is a waste of time as it gets rid of this functionality, although I do still want to tidy up how timeslices are dealt with as it's a complete mess at the moment.

tsmbland · 2024-10-22T14:04:52Z

Thanks! This is really useful. I'm going to give this a try and see how I get on

tsmbland added 7 commits October 9, 2024 15:36

Fix use of UF and MSF in constraints

7defafb

Fix a few more UF/MSF occurences

b332cd9

Bump version

2ebb8a3

Fix max_production function

a61a036

FIx timeslice reader

b7eb0de

Update results files

cd630fe

Update example results

8f2d7fe

tsmbland changed the title ~~Fix supply issue~~ Fix timeslice-related bugs Oct 17, 2024

tsmbland changed the base branch from develop to v1.3 October 17, 2024 17:14

tsmbland added 3 commits October 17, 2024 19:03

Merge branch 'dispatch_production' into fix_supply_issue

6340ab0

Fix minimum_production function

333f147

Update results files

268877a

tsmbland changed the base branch from v1.3 to dispatch_production October 18, 2024 07:48

tsmbland and others added 9 commits October 18, 2024 08:48

Merge branch 'dispatch_production' into fix_supply_issue

8708956

Update results files

98d1bd3

Merge branch 'fix_supply_issue' of https://github.com/EnergySystemsMo…

4bcb467

…dellingLab/MUSE_OS into fix_supply_issue

Remove redundant convert_timeslice calls from tests

c7f49dd

Fix some tests

51bc7a9

Fix some more tests

4f057a4

Fix error in unmet_forecasted_demand

dcc14e9

Fix remaining tests

9030be1

Update tutorial

8e8b5bf

tsmbland marked this pull request as ready for review October 18, 2024 14:38

Base automatically changed from dispatch_production to v1.2.2 October 18, 2024 15:41

tsmbland requested a review from dalonsoa October 18, 2024 15:58

tsmbland linked an issue Oct 18, 2024 that may be closed by this pull request

Results are different when specifying utilization factor at the timeslice level #512

Closed

tsmbland mentioned this pull request Oct 21, 2024

Fix timeslice-related bugs (take 2) #520

Closed

8 tasks

dalonsoa approved these changes Oct 22, 2024

View reviewed changes

This was referenced Oct 24, 2024

Fix more broadcasting errors #533

Closed

Fix more broadcasting errors #534

Merged

tsmbland added 2 commits October 25, 2024 12:07

Use local timesliced objects rather than TIMESLICE

b17a9b7

Fix doctest

3407cfe

tsmbland merged commit 0eea1ca into v1.2.2 Oct 25, 2024
11 of 14 checks passed

tsmbland deleted the fix_supply_issue branch October 25, 2024 11:26

This was referenced Oct 25, 2024

v1.2.2 #535

Merged

Results are different when specifying utilization factor at the timeslice level #512

Closed

tsmbland mentioned this pull request Nov 7, 2024

Simplify the use of timeslices #519

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix timeslice-related bugs #518

Fix timeslice-related bugs #518

tsmbland commented Oct 9, 2024 •

edited

Loading

tsmbland commented Oct 18, 2024

dalonsoa left a comment •

edited

Loading

dalonsoa Oct 22, 2024

tsmbland Oct 22, 2024

tsmbland Oct 24, 2024

tsmbland commented Oct 22, 2024

Fix timeslice-related bugs #518

Fix timeslice-related bugs #518

Conversation

tsmbland commented Oct 9, 2024 • edited Loading

tsmbland commented Oct 18, 2024

dalonsoa left a comment • edited Loading

Choose a reason for hiding this comment

dalonsoa Oct 22, 2024

Choose a reason for hiding this comment

tsmbland Oct 22, 2024

Choose a reason for hiding this comment

tsmbland Oct 24, 2024

Choose a reason for hiding this comment

tsmbland commented Oct 22, 2024

tsmbland commented Oct 9, 2024 •

edited

Loading

dalonsoa left a comment •

edited

Loading