-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use StormEvents / Searvey to Compare ADCIRC Results with Real-world Data #50
Comments
@brey @g7192 @SorooshMani-NOAA Hi George, ** Regarding the work for Jane and I don’t know what she is working on, but I would suggest the time-series clean up problem. I am posting some resources below and I am available for elaboration. This could be a literature review and/or prototyping exercise. Cheers, G ** https://arxiv.org/pdf/2004.08284.pdf |
@saeed-moghimi-noaa @SorooshMani-NOAA @brey Currently, I've been reading documentation about the OceanModeling packages, specifically StormEvents, as well as about time series, xarrays, and statistical methods in general. Here is my working code in Python: import stormevents
import xarray
import numpy as np
from stormevents.coops import coops_stations
from stormevents.coops import COOPS_Station
from datetime import datetime, timedelta
from shapely.geometry import Polygon
from stormevents.coops import coops_stations_within_region
from stormevents.coops import coops_product_within_region
from matplotlib import pyplot as plt
# Retrieve simulation data (combined water level)
file = '/Desktop/estofs.t06z.points.cwl.nc'
sim_data = xarray.open_dataset(file, engine="netcdf4")
sim_data.data_vars
sim_data.dims
# Obtain time series for a station at random from simulation data
sim_station_name = sim_data.station_name[4]
sim_station_lon = sim_data.x[4]
sim_station_lat = sim_data.y[4]
sim_timeseries = sim_data.zeta[:, 4]
sim_timeseries_df = sim_timeseries.to_dataframe()
station_id = int(np.reshape(sim_station_name.astype(str).str.split("")[2].to_numpy(), 1)[0])
# Use StormEvents to obtain time series for the same station using the real data
station = COOPS_Station(station_id)
start_date = sim_timeseries_df.index[0]
end_date = sim_timeseries_df.index[len(sim_timeseries_df.index) - 1]
real_df = station.product('water_level', start_date=start_date, end_date=end_date).to_dataframe()
real_timeseries_df = real_df.iloc[:, 0]
real_timeseries_df = real_timeseries_df.reset_index(level='nos_id')
real_timeseries_df = real_timeseries_df.iloc[:, 1]
# Re-index simulation data time series based on real data time series
sim_timeseries_df = sim_timeseries_df.reindex(index=real_timeseries_df.index, method='nearest').drop_duplicates()
# Plot both time series (simulation data and real data) on the same graph
plt.plot(sim_timeseries_df, label='Simulation Data Time Series')
plt.plot(real_timeseries_df, label='Real Data Time Series')
plt.title('Water Level Time Series Comparison')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Compute some statistics to compare simulation data and real data
# Can convert both time series from dataframes to numpy arrays first
real_timeseries_array = real_timeseries_df.to_numpy()
sim_timeseries_array = sim_timeseries_df.to_numpy()
# Bias
# Positive values of bias indicate negatively-biased modelled data
# Negative values of bias indicate positively-biased modelled data
bias = real_timeseries_df.mean() - sim_timeseries_df.mean()
# Or, can compute bias or mean error in this way since expectation is linear
mean_error = np.nanmean(real_timeseries_array - sim_timeseries_array)
# Now, remove bias between the two time series and re-plot
plt.plot(sim_timeseries_df + bias, label='Simulation Data Time Series (Bias-corrected)')
plt.plot(real_timeseries_df, label='Real Data Time Series')
plt.title('Water Level Time Series Comparison (Bias-corrected)')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Mean Absolute Error
mean_abs_error = np.nanmean(abs(real_timeseries_array - sim_timeseries_array))
# Root Mean Square Error (RMSE)
rmse = np.sqrt(np.nanmean((real_timeseries_array - sim_timeseries_array) ** 2))
# Scatter Index or Percentage RMSE
scatter_index = (rmse / np.nanmean(real_timeseries_array)) * 100
# Side note: we can also obtain water level data for stations within a given region using the Polygon() function
polygon = Polygon([(-67.204002, 44.654499), (-70.563301, 43.32), (-71.16481, 41.707001)])
coops_product_within_region(
'water_level',
region=polygon,
start_date=datetime(2021, 2, 2),
end_date=datetime(2021, 2, 9, 18)
) |
Thanks @g7192 . |
I took the jpg out of code section and it showed up. Looks very good. To compare the signal you also can remove the bias. Something like: bias = data.mean() - model.mean()
plt.plot(sim_timeseries_df + bias, label='Simulation Data Timeseries')
plt.plot(real_timeseries_df, label='Real Data Timeseries') |
My new code reflects the following changes that I made:
Updated code: import xarray
import numpy as np
import time
from stormevents.coops import coops_stations
from stormevents.coops import COOPS_Station
from datetime import datetime, timedelta
from shapely.geometry import Polygon
from stormevents.coops import coops_stations_within_region
from stormevents.coops import coops_product_within_region
from matplotlib import pyplot as plt
# Retrieve simulation data (combined water level)
file = '/Desktop/estofs.t06z.points.cwl.nc'
sim_data = xarray.open_dataset(file, engine="netcdf4")
sim_data.data_vars
sim_data.dims
def main():
# Obtain time series for a station at random from simulation data
stats_dict = {}
for i in range(100):
sim_station_name = sim_data.station_name[i]
sim_station_lon = sim_data.x[i]
sim_station_lat = sim_data.y[i]
sim_timeseries = sim_data.zeta[:, i]
sim_timeseries_df = sim_timeseries.to_dataframe()
station_id = int(np.reshape(sim_station_name.astype(str).str.split("")[2].to_numpy(), 1)[0])
stats_dict[station_id] = {}
# Use StormEvents to obtain time series for the same station using the real data
station = COOPS_Station(station_id)
start_date = sim_timeseries_df.index[0]
end_date = sim_timeseries_df.index[len(sim_timeseries_df.index) - 1]
real_df = station.product('water_level', start_date=start_date, end_date=end_date).to_dataframe()
if real_df.empty:
continue
else:
real_timeseries_df = real_df.iloc[:, 0]
real_timeseries_df = real_timeseries_df.reset_index(level='nos_id')
real_timeseries_df = real_timeseries_df.iloc[:, 1]
# Re-index real data time series based on simulation data time series
real_timeseries_df = real_timeseries_df.reindex(index=sim_timeseries_df.index, method='nearest').drop_duplicates()
# Plot both time series (simulation data and real data) on the same graph
plt.plot(sim_timeseries_df, label='Simulation Data Time Series')
plt.plot(real_timeseries_df, label='Real Data Time Series')
plt.title('Water Level Time Series Comparison')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Compute some statistics to compare simulation data and real data
# Can convert both time series from dataframes to numpy arrays first
real_timeseries_array = real_timeseries_df.to_numpy()
sim_timeseries_array = sim_timeseries_df.to_numpy()
# Bias or Mean Error
# Positive values of bias indicate negatively-biased modelled data
# Negative values of bias indicate positively-biased modelled data
bias = np.nanmean(real_timeseries_array - sim_timeseries_array)
stats_dict[station_id]['Bias'] = bias
# Now, remove bias between the two time series and re-plot
plt.plot(sim_timeseries_df + bias, label='Simulation Data Time Series (Bias-corrected)')
plt.plot(real_timeseries_df, label='Real Data Time Series')
plt.title('Water Level Time Series Comparison (Bias-corrected)')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Mean Absolute Error
mean_abs_error = np.nanmean(abs(real_timeseries_array - sim_timeseries_array))
stats_dict[station_id]['Mean Absolute Error'] = mean_abs_error
# Root Mean Square Error (RMSE)
rmse = np.sqrt(np.nanmean((real_timeseries_array - sim_timeseries_array) ** 2))
stats_dict[station_id]['RMSE'] = rmse
# Scatter Index or Percentage RMSE
scatter_index = (rmse / np.nanmean(real_timeseries_array)) * 100
stats_dict[station_id]['Scatter Index'] = scatter_index
print(stats_dict)
# Side note: we can also obtain water level data for stations within a given region using the Polygon() function
polygon = Polygon([(-67.204002, 44.654499), (-70.563301, 43.32), (-71.16481, 41.707001)])
coops_product_within_region(
'water_level',
region=polygon,
start_date=datetime(2021, 2, 2),
end_date=datetime(2021, 2, 9, 18)
)
if __name__ == '__main__':
starttime = time.time()
main()
endtime = time.time()
print(f"Total time = {-starttime+endtime}") |
Thank you for the update @g7192, keep up the good work! Let's discuss the next steps in our next tag up. |
@g7192 |
#!/usr/bin/python
""" Import libraries and script options """
import os, sys
import warnings
import time
import pyschism
import numpy as np
import pandas as pd
from pyschism.mesh import Hgrid
from tqdm.notebook import tqdm_notebook
os.system("clear") # Clear the screen
warnings.filterwarnings("ignore") # turn off warnings
# ========================== Globals ========================== #
# Path to reqiured files:
#case = 'irene_delaware_2011/tide_only/spinup_run/' # 'irene_delaware_2011' 'florence_atlantic_2018'
case = 'irene_delaware_2011/VIMS_domain/'
PathtoWork = '/scratch2/COASTAL/coastal/save/Bahram.Khazaei/work/' # work directory
PathToCodes = PathtoWork + 'source_codes/schism_src/my_schism_utility/' # schism utility functions
PathToMyFunctions = PathToCodes + 'utility_functions/'
PathToMySchismFunctions = PathtoWork +'source_codes/schism_src/my_schism_utility/utility_functions/'
PathtoRunDir = '/scratch2/COASTAL/coastal/save/Bahram.Khazaei/work/run_dir/' # my run directory
#PathToPriorOutputs = PathtoRunDir + case + '../ensemble/outputs_ensemble0/'
PathToPriorOutputs = PathtoRunDir + case + '[email protected]/'
#PathToPriorOutputs = PathtoRunDir + case + 'files_backup_original/outputs_original/'
PathToObsrv = PathToCodes + 'validation_tools/coops_water_level_observations/water_levels_delaware_2011_months_4_10.nc'
# Ensemble options:
Nens = 10 # Number of ensembles
# selected points for time series plotting:
# 1) can be defined manually, for example: points = [(-75.20,39), (-74, 38), (-65,25)]
# 2) can be read from water level observation files
points=[]
# Define station list for stations argument or for all stations in obs file use: stations=None
select_stations = [8548989, 8539094, 8545240, 8540433, 8551762, 8551910, 8537121, 8555889, 8536110, 8557380]
pointsx = [-74.7517, -74.8700, -75.1407, -75.4100, -75.5883, -75.5717, -75.3750, -75.1133, -74.9668, -75.1200]
pointsy = [ 40.1367, 40.0817, 39.9333, 39.8117, 39.5817, 39.5583, 39.3050, 38.9867, 38.9667, 38.7816]
ajustments= [-5.5, -6.25, -2.25, -7.75, -7.85, -1.45, -6.75, -6.75, -1.70, -1.65]
# COLOR Coding:
C_RED = '\33[3;1;31m'
C_BLUE = '\33[3;1;34m'
C_BOLD = '\33[1m'
C_CLEAR = '\33[0m'
# ====================== End of Globals ======================= #
""" Simulations """
start_time = time.time()
print(C_RED+"☑ Script started ... "+C_CLEAR)
# Append path and import required functions:
sys.path.append(PathToMyFunctions)
sys.path.append(PathToMySchismFunctions)
from SchismFunctions import read_SchismWaterElevation2D
from SchismFunctions import read_ObservedWaterLevel
from PythonFunctions import find_indices2D, ModelSkillsAll, PlotSkillHeatmap, PlotEnsembleSkillHeatmap
# Read the grid first:
print(' → Reading the \033[1m'+'hgrid.gr3'+'\033[0m file for grid information ...')
hgrid = Hgrid.open(PathtoRunDir + case + 'hgrid.gr3', crs='EPSG:4326')
# Read observations:
checkpoint_time = time.time()
df_obs, data_obs, NOSstationIDs = read_ObservedWaterLevel(PathToObsrv, stations=select_stations, obs_adjustment=ajustments) # stations=None
# Read simulations:
# Create an array of coordinates for stations locations and get their mesh node indices:
#pointsx, pointsy = data_obs.x.values, data_obs.y.values
points = np.column_stack([pointsx, pointsy])
points_indices = find_indices2D(points, hgrid.x, hgrid.y) # Indices for slected stations at the base run
df_sim, times, t0, dt = read_SchismWaterElevation2D(PathToPriorOutputs, points_indices, stationIDs=NOSstationIDs, old_version=None)
checkpoint_time = time.time() - checkpoint_time
print('☑ Total time to read the nc files: \033[1m', int(checkpoint_time), 'seconds\033[0m!')
# Calculate Model Skills for all stations:
SkillStats , ij = [], 0
for nos_station in df_obs.columns:
# merge obs and sim by time:
merge = df_sim[[nos_station]].merge(df_obs[[nos_station]], on = 'date', how = 'inner',suffixes=('_sim', '_obs')).dropna()
# get sim and obs values of merged dataframe:
sim = merge.iloc[:][str(nos_station)+'_sim'].values
obs = merge.iloc[:][str(nos_station)+'_obs'].values
# calculate skills:
SkillStats0 = ModelSkillsAll(sim, obs, stationID=nos_station, Index=ij)
SkillStats.append(SkillStats0)
ij = ij + 1
# Put all stations in one dataframe:
SkillStats= pd.concat(SkillStats, axis=1)
# Plot Skill Matrix:
ModelSkillFileName = 'irene_skills_tide_only.png'
PlotSkillHeatmap(SkillStats, units='m', cmap_type='coolwarm', Nbins=5, DPI=None, save_file_name=ModelSkillFileName)
checkpoint_time = time.time() - checkpoint_time
print('☑ Total time to calculate model skills: \033[1m', int(checkpoint_time), 'seconds\033[0m!')
# Calculate Model Skills for all stations and in ensemble of simulations:
# Read simulations:
# Create an array of coordinates for stations locations and get their mesh node indices:
pointsx, pointsy = data_obs.x.values, data_obs.y.values
points = np.column_stack([pointsx, pointsy])
points_indices = find_indices2D(points, hgrid.x, hgrid.y) # Indices for slected stations at the base run
# Read all ensembles:
Nens = 10 ##########
BD0, MSE0, RMSE0, NRMSE0, NSE0, RHO0 = [], [], [], [], [], []
for i in range(0, Nens):
print('☑ Reading \033[1m'+'Ensemble Number: '+str(int(i))+'\033[0m')
PathToEnsembleOutputs = PathtoRunDir + case + '/../ensemble/outputs_ensemble'+str(i)+'/'
df_sim, times, t0, dt = read_SchismWaterElevation2D(PathToEnsembleOutputs, points_indices, stationIDs=NOSstationIDs)
# Calculate Stats:
SkillStats , ij = [], 0
for nos_station in df_obs.columns:
# merge obs and sim by time:
merge = df_sim[[nos_station]].merge(df_obs[[nos_station]], on = 'date', how = 'inner',suffixes=('_sim', '_obs')).dropna()
# get sim and obs values of merged dataframe:
sim = merge.iloc[:][str(nos_station)+'_sim'].values
obs = merge.iloc[:][str(nos_station)+'_obs'].values
# calculate skills:
SkillStats0 = ModelSkillsAll(sim, obs, stationID=nos_station, Index=ij)
SkillStats.append(SkillStats0)
ij = ij + 1
# Put all stations in one dataframe and read desired skill metrics for all stations,
# then append the previous ensembles:
SkillStats= pd.concat(SkillStats, axis=1)
BD0.append(np.array(SkillStats.iloc[7,:]))
MSE0.append(np.array(SkillStats.iloc[9,:]))
RMSE0.append(np.array(SkillStats.iloc[10,:]))
NRMSE0.append(np.array(SkillStats.iloc[11,:]))
NSE0.append(np.array(SkillStats.iloc[8,:]))
RHO0.append(np.array(SkillStats.iloc[5,:]))
# convert metrics to array:
# For version with 6 metrics
# SkillStats3D = [NOSstationIDs,
# np.array(BD0), np.array(MSE0), np.array(RMSE0),
# np.array(NRMSE0), np.array(NSE0), np.array(RHO0)]
SkillStats3D = [NOSstationIDs,
np.array(BD0), np.array(RMSE0),
np.array(NSE0)+0.5, np.array(RHO0)+0.5]
checkpoint_time = time.time() - checkpoint_time
print('☑ Total time to calculate model skills: \033[1m', int(checkpoint_time), 'seconds\033[0m!')
# Plot Skill Matrix:
ModelSkillFileName = 'skill_test_ensmeble.png'
PlotEnsembleSkillHeatmap(SkillStats3D, units='m', cmap_type='coolwarm', Nbins=5, DPI=None, save_file_name=ModelSkillFileName)
print(C_RED+"☑ Script completed successfully!"+C_CLEAR) |
@g7192 following our discussion with @saeed-moghimi-noaa, let's do this:
|
@saeed-moghimi-noaa @SorooshMani-NOAA storm = StormEvent('FLORENCE', 2018) Upon running the above line of code, I get the following error: Do you have any suggestions? Thanks! |
@g7192 can you try installing the pip install certifi and then try getting storm information again |
@saeed-moghimi-noaa @SorooshMani-NOAA import xarray
import numpy as np
import re
from stormevents.coops import coops_stations
from stormevents.coops import COOPS_Station
from datetime import datetime
from matplotlib import pyplot as plt
from stormevents import StormEvent
# Identify stations affected by the storm Elsa using StormEvents and pick a particular station for further analysis
storm = StormEvent('elsa', 2021)
storm.track()
elsa_data = storm.coops_product_within_isotach('water_level', wind_speed=34, start_date='2021-07-02 12:00:00', end_date='2021-07-10 06:00:00')
station_id = np.array([elsa_data.nos_id[1]])[0]
station = COOPS_Station(station_id)
# Retrieve simulation data (combined water level) for specific station chosen above
file = '/Desktop/estofs.t00z.points.cwl-4.nc'
sim_data = xarray.open_dataset(file, engine="netcdf4")
sim_station_names = np.array(sim_data.station_name.astype(str)).tolist()
sim_station_index = sim_station_names.index([i for i in sim_station_names if re.findall(station_id.astype(str), i)][0])
sim_timeseries = sim_data.zeta[:, sim_station_index]
sim_timeseries_df = sim_timeseries.to_dataframe()
# Use StormEvents to obtain time series for the same station using Elsa data
start_date = sim_timeseries_df.index[0]
end_date = sim_timeseries_df.index[len(sim_timeseries_df.index) - 1]
elsa_df = station.product('water_level', start_date=start_date, end_date=end_date).to_dataframe()
elsa_timeseries_df = elsa_df.iloc[:, 0]
elsa_timeseries_df = elsa_timeseries_df.reset_index(level='nos_id')
elsa_timeseries_df = elsa_timeseries_df.iloc[:, 1]
# Re-index Elsa data time series based on simulation data time series
elsa_timeseries_df = elsa_timeseries_df.reindex(index=sim_timeseries_df.index, method='nearest').drop_duplicates()
# Plot both time series (simulation data and Elsa data) together
plt.plot(sim_timeseries_df, label='Simulation Data Time Series')
plt.plot(elsa_timeseries_df, label='Elsa Data Time Series')
plt.title('Water Level Time Series Comparison')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Compute some statistics to compare simulation data and Elsa data
# Can convert both time series from dataframes to numpy arrays first
elsa_timeseries_array = elsa_timeseries_df.to_numpy()
sim_timeseries_array = sim_timeseries_df.to_numpy()
# Bias or Mean Error
# Positive values of bias indicate negatively-biased modelled data
# Negative values of bias indicate positively-biased modelled data
bias = np.nanmean(elsa_timeseries_array - sim_timeseries_array)
bias
# Now, remove bias between the two time series and re-plot
plt.plot(sim_timeseries_df + bias, label='Simulation Data Time Series (Bias-corrected)')
plt.plot(elsa_timeseries_df, label='Elsa Data Time Series')
plt.title('Water Level Time Series Comparison (Bias-corrected)')
plt.xlabel('Date/Time')
plt.legend()
plt.show()
# Mean Absolute Error
mean_abs_error = np.nanmean(abs(elsa_timeseries_array - sim_timeseries_array))
mean_abs_error
# Root Mean Square Error (RMSE)
rmse = np.sqrt(np.nanmean((elsa_timeseries_array - sim_timeseries_array) ** 2))
rmse
# Scatter Index or Percentage RMSE
scatter_index = (rmse / np.nanmean(elsa_timeseries_array)) * 100
scatter_index |
@g7192 |
@saeed-moghimi-noaa @SorooshMani-NOAA It seems that, even though these figures are from a station affected by Elsa, simulation data and real (Elsa) data line up quite well. |
@g7192, please document your progress in this "issue". This would then serve as a good showcase for using StormEvents and/or Searvey.
Just for the sake of completeness:
The text was updated successfully, but these errors were encountered: