Skip to content

Overview

jrocha edited this page Jan 29, 2019 · 1 revision

Overview

PyForecast is a statistical modeling tool useful in predicting season inflows and streamflows. The tool collects meterological and hydrologic datasets, analyzes hundreds to thousands of predictor subsets, and returns well-performing statistical regressions between predictors and streamflows.

Data is collected from web services located at NOAA, RCC-ACIS, NRCS, Reclamation, and USGS servers, and is stored locally on the user’s machine. Data can be updated with current values at any time, allowing the user to make current water-year forecasts using equations developed with the program.

After potential predictor datasets are downloaded and manipulated, the tool allows the user to develop statistically significant regression equations using multiple regression, principal components regression, z-score regression, general regression neural networks, and gaussian process regression. Equations are developed using a combination of paralleled sequential forward selection and cross validation, both described in the Statistical Methodologies section of this document.

Summary Tab

The Summary Tab displays a summary of the regression equations, predictors, and the forecast point for the user to quickly review equations they’ve developed, as well as view current water year forecasts.

The Summary tab will populate after the first forecast model is completely developed. To view a completed forecast model, expand the forecast month in the “Chosen Models” table, and choose the equation to view.

To generate a current water year forecast, right click on the forecast entry and choose "Generate Current Forecast". If an equation contains predictors that are current (e.g. if the current date is March 05th and the user selects a March 01st equation), then the program will generate a current water year forecast based on the data for the current water year, if it exists in the forecast file yet. The current forecast is plotted with it's 90% prediction interval in the plots.

Stations Tab

The Stations Tab allows users to locate datasets that may be valuable for their analysis. Users can find SNOTEL stations and snow courses, reservoirs, stream gages, as well as PRISM and NRCC data gridded temperature and precipitation data, and climate indices.

Stations are found by navigating in the station map to the area of interest and browsing through the station markers. If the user decides that a particular station might be useful in their analysis, they can choose the ‘Add Site’ button in the station pop-up to add the station to the selected datasets table. (Stations can later be removed from the selected datasets table by right-clicking a station and choosing ‘Delete table row’).

Additionally, watershed-averaged gridded datasets can be downloaded by entering the 8-digit HUC identification number next to the desired gridded dataset in the ‘Other Datasets’ pane and choosing the ‘Add’ button. Only valid HUC’s can be entered. More information on the NRCC gridded dataset can be found at NRCC's webpage. PRISM documentation is located at PRISM's webpage.

Lastly, users can choose to include climate indices in their analysis by selecting the relevant indices next to the ‘Climate’ drop down in the ‘Other Datasets’ pane. Each indice has been shown to be correlated to a particular region's precipitation and streamflow patterns. More information is available at the Climate Prediction Center.

Additional datasets can be added using user-defined dataloaders. To define a new dataloader, use the 'Edit Dataloaders' option in the file menu. After a valid dataloader has been saved, users can add custom datasets using the 'Define Custom Dataset' button. More information on this process can be found in the 'Custom Dataloaders' section of this document's appendix.

Data Tab

The Data Tab allows users to download, view, update, and import period of record data for stations they selected in the stations tab.

When the user clicks 'Download' for the first time, daily data is donwloaded for each dataset selected in the stations tab, beginning with the first day of the first water year in the specified POR, and ending with the current day's value.

Users should note that the program will only download data that is available. For example, if a user specifies a POR of 30 years, and the requested dataset only has 10 years of data, the software will return 20 years of NaN's and 10 years of actual data for that dataset.

Users can specify whether or not they wish to preprocess data as it is downloaded. Options include:

  • Fill NaN's: If any non-SNOTEL (and non-SNOWCOURSE) station has missing data over a period less than 4 days long, cubic splines will be used to interpolate the missing values. See Pandas documentation and Scipy Documentation for more information.

Users may wish to add their own datasets from excel spreadsheets or CSV files. Formatted datasets with .xlsx and .csv file extensions can be added using the 'Import' button. Spreadsheets should be formatted as follows before they are imported:

  • The first row should contain data headers, such as the name of the dataset.
  • The remaining rows should contain datetimes (in the first column) and data for the dataset in the second column.

An example spreadsheet might look like this:

Once the data finished downloading, the Data Table populates. Users can view a data column by clicking on one or more column headers in the data table.

Additionally, users can also delete columns from the data table, if for some reason the downloaded dataset is not serially complete, or otherwise unusable. To delete a dataset from the data table, right click on the relevant column and choose 'Delete Table Column'. This will also delete the relevant dataset from the station's tab.

Forecast Options Tab

The Forecast Options Tab allows users to specify the structure and frequency of thier forecast equations.

To begin the process of generating forecast equations, users should fill out the "Set Options" section and click "Apply Options". At this point the program will begin generating predictors by resampling daily data into monthly and weekly datasets. When finished, the program will display a complete list of predictors in the "All Available Predictors" tree.

The 'Set Options' pane allows users to specify the specific time periods and frequencies associated with their forecasts for a particular reservoir or stream.

  • Forecast Period: Specify the base the forecast period. For example if you wished to forecast the inflow volume between April and July for a reservoir, you would set the forecast period to 'April' and 'July'. This effectively sets the left hand side of your forecast equations (e.g. Apr-Jul inflow = some equation...).
  • Forecast Frequency: Specify how often you will be generating forecast equations. 'Monthly' will generate one forecast equation per month, and 'Bimonthly' generates 2 forecast equations per month.
  • Forecasts start on: Specify the first forecast date. This is the date when you expect to begin forecasting the streamflow or inflow.
  • Forecast Target: Specify which dataset you are forecasting. Only streamflow or inflow datasets can be forecast. Therefore if you are trying to forecast a custom datset (imported or web service) be sure to set the parameter to 'inflow' or 'streamflow'.
  • Accumulate Precipitation: Specify whether you wish to do seasonal accumulations for datasets with 'Accumulation' resampling methods. For example, if you included a SNOTEL precipitation dataset, and you wanted to generate a predictort that kept track of the water-year-to-date precipitation total, you would check the 'Yes' box and specify 'October' in the 'Accumulate From' drop-down menu.

Users can additionally specify notes and thier name.

When the 'Apply Options' button is pressed, PyForecast will apply resampling methods to the daily data and generate seasonal forecast predictors, with water-year indices. The generated predictors will populate the 'All Available Predictors' table. Users can view individual predictor data by dragging the predictor into the time series plot below the table.

The predictand (the actual seasonal flow volumes) are also generated at this time and can be found in the 'Equation Pools' dictionary. A dictionary entry is created for each forecast equation. Inside of the dictionary entry is the predictand that is used in that equation's regression. For example, the 'predictand' of the 'January 15th' equation will be the actual April-July inflow volumes observed for the period of record. The 'predictand' for the 'June 1st' equation will be the actual June-July inflow volumes observed. Users can view the predictand by dragging it into the time series plot beneath the 'All Available Predictors' table.

Users can view correlations between predictors and predictands by dragging 2 datasets into the time series plot and selecting the 'Correlation' button. A simple linear regression is run between the 2 datasets and the coeffiecent of determination is reported.

Regression Tab

The Regression Tab allows the user to develop statistical forecast equations using cross-validated regression schemes.

The user chooses a regression model by selecting the appropriate tab:

  • MLR: Multiple Linear Regression
  • PCAR: Principal Components Regression
  • ZSCR: Z-Score Regression
  • GRNN: General Regression Neural Network
  • GPR: Gaussian Process Regression

Density Analysis Tab

Back to Home

Clone this wiki locally