Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

39 add overschrijding hydra statistics #106

Merged
merged 23 commits into from
Jul 2, 2024

Conversation

Mares2022
Copy link
Collaborator

I have mainly modified KWK_process.py. I have added a function called set_hydra_nl_table() to set the Hydra-NL original tables into the dist_vali_exc dict dictionary. The set_table() function was also added to set validation data and the old exceedance frequency Hydra-NL function.

The pull request also includes the tables generated in Hydra-NL in .xls and .csv format for the stations available + Hydra-NL documentation.

The pull request has two jupyter notebooks, the first one called geometries was used KWK_geometries.ipynb was used to calculate the stations locations. However, this code is not working anymore after the changes made in the main source code for me. The same error is shown when running KWK_getcheckdata.py from this branch and from the main branch. Something to check later on to be sure there is no problem with the environments.

The second jupyter notebook KWK_process.ipynb can be ignored and not included in the main branch, as I only used it for testing purposes. I would like to keep the code anyway if possible in the main branch for future development.

Mares2022 added 16 commits June 24, 2024 15:43
Add station data
Stations data
Geometries code
Change in process code
Plotting stations with KWK_getcheckdata.py
Adding stations data
Updating tables to avoid string symbols error
Update KWK_proces.py with functions to read Hydra-NL data. This is working for HOEKVHLD only.
Update KWK_proces.py with functions to read Hydra-NL data. This is working for all the stations with Hydra-NL data.
Updating code KWK_process.py code to work with multiple stations
Adding all the Hydra-NL station list to the code
Cleaning repository
Adding station files in .csv format.
@Mares2022 Mares2022 requested a review from veenstrajelmer July 2, 2024 07:14
@Mares2022 Mares2022 self-assigned this Jul 2, 2024
@Mares2022 Mares2022 linked an issue Jul 2, 2024 that may be closed by this pull request
3 tasks
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.96%. Comparing base (65adefe) to head (44eb596).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #106      +/-   ##
==========================================
- Coverage   87.97%   87.96%   -0.02%     
==========================================
  Files           9        9              
  Lines        1156     1155       -1     
==========================================
- Hits         1017     1016       -1     
  Misses        139      139              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Jul 2, 2024

Thanks for this PR, I do have some additional requests:

  • no data in the repository, in this case all csv/xls/json files. You can put a link to the p-drive in the example script. I know this is hard-coded and a bit ugly, but at least then it is clear that it is a dependent data source. I also see several copies of the csv-files on the P-drive, please try to avoid this to keep everything transparent.
  • it is good to work with a single source of truth, this is complex when putting code in py as well as ipynb, I would suggest to remove the ipynb files. If you want to work interactively with vscode, you can also run the .py files as if they were notebooks. I do not know how exactly, but I have seen it happening.
  • it is not clear to me where the csv files come from. I expect you create them from the hydra-xls files with one of the ipynb files (I have not looked at them). If this is indeed the case, it would be cleaner to just read the hydra-xls file directly. If the csv files come directly from hydra-nl, the xls files can be removed right?
  • minimize the diff (e.g. revert the change in KWK_getcheckdata.py since it does not contribute to this PR)

@Mares2022
Copy link
Collaborator Author

Mares2022 commented Jul 2, 2024

Adressing @veenstrajelmer requests.

  • no data in the repository, in this case all csv/xls/json files. I have removed now the files and saved them in the Pdrive: P:\11210325-005-kenmerkende-waarden\work\data_hydraNL

  • it is good to work with a single source of truth, this is complex when putting code in py as well as ipynb, I would suggest to remove the ipynb files. I have removed the KWK_process.ipynb. The KWK_geometries.ipynb has an interactive map that requires a notebook to be displayed. In case this code is not needed anymore I will erase it, but converting to .py cannot be done. Let me know if I should erase it.

  • it is not clear to me where the csv files come from. I expect you create them from the hydra-xls files with one of the ipynb files (I have not looked at them). If this is indeed the case, it would be cleaner to just read the hydra-xls file directly. Files in .xls have symbols in a code that did not allow me to open them with pandas. The created excel files are also in an old format. The best approach based on my experience will be to create .csv files from the python Hydra-NL API, or request the developers to provide the data in these formats with standard code symbols. The csv were created manually in excel.

  • minimize the diff (e.g. revert the change in KWK_getcheckdata.py since it does not contribute to this PR). I will take it back to the main branch version.

…ckdata.py.

Removing files from the repository and removing changes in KWK_getcheckdata.py.
@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Jul 2, 2024

Files in .xls have symbols in a code that did not allow me to open them with pandas.

I have looked into reading this old xls format and found this solution that works:

import pandas as pd
file_xls = r"p:\11210325-005-kenmerkende-waarden\work\data_hydraNL\DENHDR.xls"
df = pd.read_table(file_xls, encoding='latin-1')

…ing='latin-1') function.

Change to read the .xls files with df = pd.read_table(file_xls, encoding='latin-1') function. Creating .csv and reading them is not anymore needed.
@Mares2022
Copy link
Collaborator Author

A new commit including df = pd.read_table(file_xls, encoding='latin-1') was created. Now we are able to read the original Hydra-NL files in .xls format.

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Jul 2, 2024

Thanks for the updates! I have now looked into KWK_geometries.ipynb and it can indeed be useful for this station selection (until we move to Hydra-RING API). However, could you trim it down so it only contains the code necessary for the interactive map? Also, you could avoid saving the geojson since the gdf variable is already being generated from the DDL catalog and you do not seem to use it elsewhere.

Mares2022 added 2 commits July 2, 2024 14:29
Integrating validation data. A message indicating if the path existed or not can be good to implement to inform the users if the plots integrated Hydra-NL data or validation data.
Taking back temporal changes for testing plot function in KWK_getcheckdata.py
@Mares2022
Copy link
Collaborator Author

Mares2022 commented Jul 2, 2024

  • KWK_process.ipynb needs to be cleaned and included in the main branch for further visualisation of stations.
  • Validation data for Hoak van Holland. The extension in the path was incorrect before. I was changed from .xls to .csv
  • A message indicating if the hydra-nl or validation file path exists or not must be implemented to inform the users if the overschrijding function integrated Hydra-NL data or validation data (if available) to avoid the above mentioned issue.

Copy link

sonarqubecloud bot commented Jul 2, 2024

@veenstrajelmer veenstrajelmer merged commit 73a0732 into main Jul 2, 2024
5 of 6 checks passed
@veenstrajelmer veenstrajelmer deleted the 39-add-overschrijding-hydra-statistics branch July 2, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add overschrijding hydra statistics
2 participants