-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
159 interpolation bug #160
Conversation
- Use mean for pbias in case different size - Add outlet_pbias_diameter - Update demo_config to use all available metrics
fix interpolation
I am not sure if I understand the logic behind computing p-bias for two variables with different lengths? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
As usual, you're changing or adding code, but there're no tests associated to those changes. In the end, you will end up with a lot of untested code, which is no good, and fixing that will take you ages. It is much better to create tests as you code.
Also, you often put together in the same PR changes that are totally unrelated. I understand why that happens, but again that's not a good practice. PR should address conceptually related aspects of the code.
For a distribution, e.g., of pipe diameters so that I can see whether the diameters produced are over or under estimates in general. |
@dalonsoa thanks for pointing out. I think the functions are included in coverage since they are called by metrics but I see I hadn't tested them separately (which is probably how I didn't notice this bug) - will add tests for them |
add more testing
If you change |
But because they are being divided by each other, they should give the same output - the only difference is that |
Ok, I think I figured out the source of confusion. As far as I know, the definition of p-bias is |
Yep - you're correct there. My adjustment is to give the same value (less the x100) for two timeseries, but to give a similarly helpful number for two distributions. I guess it would be sensible to name it something different to avoid confusion - Edit: Looks like: normalized mean bias error or relative error can mean this |
By distribution, do you mean spatial distribution? For example, for comparing the spatial distribution of pipe sizes? If that's the case, you need to use metrics like spatial autocorrelation (which you can use |
Sorry no I mean like, probability distribution or whatever.. probably an ambiguous word to use here. E.g., if I have 5 x 300mm pipes, and 3 x 500mm pipes - then the relative error is (300 - 500)/500. Its magnitude should be I think directly proportional to the KS statistic, but it retains directionality (which is helpful to understand, e.g., if you are under or over estimating pipe sizes). |
I see. I think this particular example doesn't really provide a measure of over or underestimation. In pipe network design, comparing discharge capacity and the spatial distribution of pipe geometric properties offers a more granular measure of under or overestimation. A simple measure for pipe sizes can be the number of pipes for each pipe diameter, i.e., categorical value counts. |
remove nearest... can't reproduce the issue it was trying to fix
Description
Identified bug in interpolation in #76 and some general performance improvements.
pbias
should comparemean
in case the two variables are not the same length (e.g., in diameter size)align_by_shape
rather thanmedian_coef_by_shape
- there is no point interpolating a bunch of stuff that you are about to aggregate, you should aggregate first.sub_id
rather than just applied on the wholeresults
df... <- this was the bug (and a dumb one...)outlet_pbias_diameter
inmetrics
.Fixes #159