159 interpolation bug #160

barneydobson · 2024-05-13T14:00:35Z

Description

Identified bug in interpolation in #76 and some general performance improvements.

pbias should compare mean in case the two variables are not the same length (e.g., in diameter size)
Aggregation now occurs in align_by_shape rather than median_coef_by_shape - there is no point interpolating a bunch of stuff that you are about to aggregate, you should aggregate first.
Interpolation now occurs by sub_id rather than just applied on the whole results df... <- this was the bug (and a dumb one...)
Added and test outlet_pbias_diameter in metrics.

Fixes #159

- Use mean for pbias in case different size - Add outlet_pbias_diameter - Update demo_config to use all available metrics

fix interpolation

cheginit · 2024-05-13T16:18:21Z

I am not sure if I understand the logic behind computing p-bias for two variables with different lengths?

dalonsoa

Looks good to me.

As usual, you're changing or adding code, but there're no tests associated to those changes. In the end, you will end up with a lot of untested code, which is no good, and fixing that will take you ages. It is much better to create tests as you code.

Also, you often put together in the same PR changes that are totally unrelated. I understand why that happens, but again that's not a good practice. PR should address conceptually related aspects of the code.

barneydobson · 2024-05-14T08:06:52Z

I am not sure if I understand the logic behind computing p-bias for two variables with different lengths?

@cheginit

For a distribution, e.g., of pipe diameters so that I can see whether the diameters produced are over or under estimates in general.

barneydobson · 2024-05-14T08:12:22Z

Looks good to me.

As usual, you're changing or adding code, but there're no tests associated to those changes. In the end, you will end up with a lot of untested code, which is no good, and fixing that will take you ages. It is much better to create tests as you code.

Also, you often put together in the same PR changes that are totally unrelated. I understand why that happens, but again that's not a good practice. PR should address conceptually related aspects of the code.

@dalonsoa thanks for pointing out. I think the functions are included in coverage since they are called by metrics but I see I hadn't tested them separately (which is probably how I didn't notice this bug) - will add tests for them

add more testing

dalonsoa · 2024-05-14T08:53:16Z

If you change sum by mean in a piece of code and the tests are still passing without also updating them, then they are not very good tests: they are not correctly capturing and assessing that the functionality of the function they are testing is correct. The same applies to the other bits that you changed. Coverage is a very misleading quantity: it is very easy to have tests that gives you 100% of coverage but that totally fail to asses the validity of the code. It can serve you as a guide, but you need to review each test on its own.

barneydobson · 2024-05-14T10:32:07Z

If you change sum by mean in a piece of code and the tests are still passing without also updating them, then they are not very good tests: they are not correctly capturing and assessing that the functionality of the function they are testing is correct. The same applies to the other bits that you changed. Coverage is a very misleading quantity: it is very easy to have tests that gives you 100% of coverage but that totally fail to asses the validity of the code. It can serve you as a guide, but you need to review each test on its own.

But because they are being divided by each other, they should give the same output - the only difference is that mean works when they are different lengths (just added a test for that).

cheginit · 2024-05-14T11:55:38Z

Ok, I think I figured out the source of confusion. As far as I know, the definition of p-bias is $\dfrac{\sum(y_\mathrm{sim} - y_\mathrm{obs})}{\sum y_\mathrm{obs}} \times 100$ (not $\sum y_\mathrm{sim} - \sum y_\mathrm{obs}$) which is why the lengths of the variables must be the same.

barneydobson · 2024-05-14T11:58:05Z

Yep - you're correct there. My adjustment is to give the same value (less the x100) for two timeseries, but to give a similarly helpful number for two distributions.

I guess it would be sensible to name it something different to avoid confusion - relative_difference or something?

Edit:

Looks like: normalized mean bias error or relative error can mean this

cheginit · 2024-05-14T12:14:34Z

By distribution, do you mean spatial distribution? For example, for comparing the spatial distribution of pipe sizes? If that's the case, you need to use metrics like spatial autocorrelation (which you can use pysal package with Queen or Rook contiguity).

barneydobson · 2024-05-14T12:18:41Z

Sorry no I mean like, probability distribution or whatever.. probably an ambiguous word to use here.

E.g., if I have 5 x 300mm pipes, and 3 x 500mm pipes - then the relative error is (300 - 500)/500. Its magnitude should be I think directly proportional to the KS statistic, but it retains directionality (which is helpful to understand, e.g., if you are under or over estimating pipe sizes).

cheginit · 2024-05-14T12:43:34Z

I see. I think this particular example doesn't really provide a measure of over or underestimation. In pipe network design, comparing discharge capacity and the spatial distribution of pipe geometric properties offers a more granular measure of under or overestimation. A simple measure for pipe sizes can be the number of pipes for each pipe diameter, i.e., categorical value counts.

remove nearest... can't reproduce the issue it was trying to fix

Dobson added 2 commits May 13, 2024 14:58

Update metrics

fd448f7

- Use mean for pbias in case different size - Add outlet_pbias_diameter - Update demo_config to use all available metrics

Update metric_utilities.py

15c8a4c

fix interpolation

barneydobson linked an issue May 13, 2024 that may be closed by this pull request

grid and subcatchment interpolation bug #159

Closed

Dobson added 2 commits May 13, 2024 15:09

Update test_metric_utilities.py

2474d4b

Update metric_utilities.py

7ee44c1

barneydobson requested review from cheginit and dalonsoa and removed request for cheginit May 13, 2024 14:14

dalonsoa approved these changes May 14, 2024

View reviewed changes

Update test_metric_utilities.py

009a3c5

add more testing

Update test_metric_utilities.py

8511234

Dobson added 3 commits May 16, 2024 21:22

Update metric_utilities.py

0fb5ff6

Update metric_utilities.py

7090f9b

remove nearest... can't reproduce the issue it was trying to fix

relabel pbias to relerror

de42db8

barneydobson merged commit a9f20b7 into main May 23, 2024
4 checks passed

barneydobson deleted the 159-interpolation-bug branch May 23, 2024 10:43

barneydobson mentioned this pull request Aug 22, 2024

avoid use of interpolation in metric comparison #262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

159 interpolation bug #160

159 interpolation bug #160

barneydobson commented May 13, 2024 •

edited

Loading

cheginit commented May 13, 2024

dalonsoa left a comment

barneydobson commented May 14, 2024

barneydobson commented May 14, 2024

dalonsoa commented May 14, 2024

barneydobson commented May 14, 2024

cheginit commented May 14, 2024

barneydobson commented May 14, 2024 •

edited

Loading

cheginit commented May 14, 2024

barneydobson commented May 14, 2024

cheginit commented May 14, 2024

159 interpolation bug #160

159 interpolation bug #160

Conversation

barneydobson commented May 13, 2024 • edited Loading

Description

cheginit commented May 13, 2024

dalonsoa left a comment

Choose a reason for hiding this comment

barneydobson commented May 14, 2024

barneydobson commented May 14, 2024

dalonsoa commented May 14, 2024

barneydobson commented May 14, 2024

cheginit commented May 14, 2024

barneydobson commented May 14, 2024 • edited Loading

cheginit commented May 14, 2024

barneydobson commented May 14, 2024

cheginit commented May 14, 2024

barneydobson commented May 13, 2024 •

edited

Loading

barneydobson commented May 14, 2024 •

edited

Loading