`compute_expected_counts` gives incorrect values for HANSWT ext #88

veenstrajelmer · 2024-06-20T21:07:34Z

The expected counts are higher than the actual counts for HANSWT extremes, even though there are no nans or missing timesteps. This is because the frequency in the expected counts computation is based on the median instead of the mean. For HOEKVHLD this goes fine since the expected values are lower than the actual, but for HANSWT the median freq results in expected values higher than the actual number of values.

import os
import pandas as pd
import hatyan
import kenmerkendewaarden as kw
from kenmerkendewaarden.tidalindicators import compute_actual_counts, compute_expected_counts

# set logging level to INFO to get log messages
import logging
logging.getLogger("kenmerkendewaarden").setLevel(level="INFO")

tstop_dt = pd.Timestamp(2021,1,1, tz="UTC+01:00")

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')
# dir_meas = r"c:\Users\veenstra\Downloads\measurements_wl_18700101_20240101"
current_station = 'HANSWT'

print("loading meas")
data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=current_station, extremes=True)
data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)

# computing counts
act_count_peryear = compute_actual_counts(data_pd_HWLW_all_12, freq="Y")
exp_count_peryear = compute_expected_counts(data_pd_HWLW_all_12, freq="Y")

# the max timediff is 9 hours and there are no nans, so there are no gaps
print("num nans:", data_pd_HWLW_all_12["values"].isnull().sum())
print("\nmax timediff:", (data_pd_HWLW_all_12.index[1:] - data_pd_HWLW_all_12.index[:-1]).max())

# however, the expected counts are higher because we compute the median frequency, not the mean
print("\nactual counts")
print(act_count_peryear)

print("\nexpected counts")
print(exp_count_peryear)

Gives:

num nans: 0

max timediff: 0 days 09:00:00

actual counts
time
1880     711
1881    1410
1882    1411
1883    1411
1884    1414

2019    1410
2020    1416
2021    1410
2022    1411
2023       3
Freq: A-DEC, Name: values, Length: 144, dtype: int64

expected counts
time
1880    1424.432432
1881    1420.540541
1882    1420.540541
1883    1420.540541
1884    1424.432432
    
2019    1420.540541
2020    1424.432432
2021    1420.540541
2022    1420.540541
2023    1401.600000
Freq: A-DEC, Length: 144, dtype: float64

Todo:

consider using scipy.stats.trim_mean, which exludes x percentiles of date before taking the mean. This probably gives the desired behaviour for frequency estimation. >> This still gives values that are slightly too high, so does not give the desired behaviour.
consider only counting HW's in case of extremes, the tidal periods might be more constant than duur stijging/daling. For extremes with aggers the current approach would not work also, but this alternative might. Unfortunately, there are still 12 years that result in higher expected counts than actual counts, even after using .floor() on the expected counts. Issue: we provide series of values to function, so excluding hwlw code.
apply compute_expected_counts() on hw only separately in calc_tidalindicators_HWLW(). Also do this in calc_havengetallen()
add testcase for this edgecase. Or add a testcase for hoekvhld extremes, since expected counts are too low there (because of same reason, but toolittle did not cause a problem so was not seen)
update docstrings: compute_expected_counts has an edgecase for months/years with only a value on the first and last timestep, derived frequency will be 15/183 days days and this will result in 2 expected counts. This causes the mean to be seen as valid, while it is not.
also prevent duplicate test_calc_wltidalindicators

The text was updated successfully, but these errors were encountered:

veenstrajelmer changed the title ~~compute_expected_counts give incorrect values for HANSWT ext~~ compute_expected_counts gives incorrect values for HANSWT ext Jun 20, 2024

This was referenced Jun 20, 2024

Create 0.2.0 release #21

Closed

clearer error messages for slotgem #82

Closed

Improve slotgemiddelden #85

Open

veenstrajelmer mentioned this issue Aug 22, 2024

Create 0.3.0 release #75

Closed

9 tasks

veenstrajelmer mentioned this issue Oct 1, 2024

Create 0.4.0 release #136

Open

46 tasks

veenstrajelmer linked a pull request Oct 7, 2024 that will close this issue

88 compute expected counts gives incorrect values for hanswt ext #146

Merged

veenstrajelmer closed this as completed in #146 Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`compute_expected_counts` gives incorrect values for HANSWT ext #88

`compute_expected_counts` gives incorrect values for HANSWT ext #88

veenstrajelmer commented Jun 20, 2024 •

edited

Loading

compute_expected_counts gives incorrect values for HANSWT ext #88

compute_expected_counts gives incorrect values for HANSWT ext #88

Comments

veenstrajelmer commented Jun 20, 2024 • edited Loading

`compute_expected_counts` gives incorrect values for HANSWT ext #88

`compute_expected_counts` gives incorrect values for HANSWT ext #88

veenstrajelmer commented Jun 20, 2024 •

edited

Loading