You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The expected counts are higher than the actual counts for HANSWT extremes, even though there are no nans or missing timesteps. This is because the frequency in the expected counts computation is based on the median instead of the mean. For HOEKVHLD this goes fine since the expected values are lower than the actual, but for HANSWT the median freq results in expected values higher than the actual number of values.
importosimportpandasaspdimporthatyanimportkenmerkendewaardenaskwfromkenmerkendewaarden.tidalindicatorsimportcompute_actual_counts, compute_expected_counts# set logging level to INFO to get log messagesimportlogginglogging.getLogger("kenmerkendewaarden").setLevel(level="INFO")
tstop_dt=pd.Timestamp(2021,1,1, tz="UTC+01:00")
dir_base=r'p:\11210325-005-kenmerkende-waarden\work'dir_meas=os.path.join(dir_base,'measurements_wl_18700101_20240101')
# dir_meas = r"c:\Users\veenstra\Downloads\measurements_wl_18700101_20240101"current_station='HANSWT'print("loading meas")
data_pd_HWLW_all=kw.read_measurements(dir_output=dir_meas, station=current_station, extremes=True)
data_pd_HWLW_all_12=hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)# computing countsact_count_peryear=compute_actual_counts(data_pd_HWLW_all_12, freq="Y")
exp_count_peryear=compute_expected_counts(data_pd_HWLW_all_12, freq="Y")
# the max timediff is 9 hours and there are no nans, so there are no gapsprint("num nans:", data_pd_HWLW_all_12["values"].isnull().sum())
print("\nmax timediff:", (data_pd_HWLW_all_12.index[1:] -data_pd_HWLW_all_12.index[:-1]).max())
# however, the expected counts are higher because we compute the median frequency, not the meanprint("\nactual counts")
print(act_count_peryear)
print("\nexpected counts")
print(exp_count_peryear)
consider using scipy.stats.trim_mean, which exludes x percentiles of date before taking the mean. This probably gives the desired behaviour for frequency estimation. >> This still gives values that are slightly too high, so does not give the desired behaviour.
consider only counting HW's in case of extremes, the tidal periods might be more constant than duur stijging/daling. For extremes with aggers the current approach would not work also, but this alternative might. Unfortunately, there are still 12 years that result in higher expected counts than actual counts, even after using .floor() on the expected counts. Issue: we provide series of values to function, so excluding hwlw code.
apply compute_expected_counts() on hw only separately in calc_tidalindicators_HWLW(). Also do this in calc_havengetallen()
add testcase for this edgecase. Or add a testcase for hoekvhld extremes, since expected counts are too low there (because of same reason, but toolittle did not cause a problem so was not seen)
update docstrings: compute_expected_counts has an edgecase for months/years with only a value on the first and last timestep, derived frequency will be 15/183 days days and this will result in 2 expected counts. This causes the mean to be seen as valid, while it is not.
also prevent duplicate test_calc_wltidalindicators
The text was updated successfully, but these errors were encountered:
veenstrajelmer
changed the title
compute_expected_counts give incorrect values for HANSWT extcompute_expected_counts gives incorrect values for HANSWT ext
Jun 20, 2024
The expected counts are higher than the actual counts for HANSWT extremes, even though there are no nans or missing timesteps. This is because the frequency in the expected counts computation is based on the median instead of the mean. For HOEKVHLD this goes fine since the expected values are lower than the actual, but for HANSWT the median freq results in expected values higher than the actual number of values.
Gives:
Todo:
.floor()
on the expected counts. Issue: we provide series of values to function, so excluding hwlw code.compute_expected_counts()
on hw only separately incalc_tidalindicators_HWLW()
. Also do this incalc_havengetallen()
compute_expected_counts
has an edgecase for months/years with only a value on the first and last timestep, derived frequency will be 15/183 days days and this will result in 2 expected counts. This causes the mean to be seen as valid, while it is not.test_calc_wltidalindicators
The text was updated successfully, but these errors were encountered: