This is a graduate course taught as GEOG696c (the physical geography seminar) at the University of Arizona. The class was last taught in Fall 2023. The full syllabus is available here.
This course is designed as a graduate level class in a workshop format to give students a theoretical framework, practical experience, expert knowledge, and statistical tools for analyzing spatiotemporal datasets. It is fundamentally about building tools and practical understanding so that students can knowledgeably apply these techniques in their own research. Topics include basic matrix algebra and statistics, exploratory data analysis, field correlation and regression analysis, autocorrelation and its statistical consequences in time and space, parametric and non-parametric significance testing and error analysis, empirical orthogonal functions including rotation, singular spectrum analysis, maximum covariance and canonical correspondence analysis, and traditional and multitaper spectral analysis. The course encompasses instruction and training in Python and in the use and manipulation of large multi-dimensional datasets.
The major outcome for the class for each student will be a new and independent analysis of a substantial space-time dataset, a formal manuscript describing the motivation, methods, and results of this analysis, and a professional oral presentation. Students are encouraged to bring with them or seek out data relevant to their research to use for their final project. Ideally, students' final projects will provide the material for a thesis chapter and/or peer-reviewed article.
The course was inspired by and initially based on an objective analysis/spatiotemporal data analysis class taught in MATLAB by Mike Evans.
August 23 to August 31 - Introduction to Python, NumPy
, Pandas
, Matplotlib
(Part 1 and Part2) and Linear Algebra
August 31 to September 12 - Variance, covariance, and correlation -- Part 1 and Part 2 -- plus an introduction to xarray
September 14 to September 21 - Introduction to Empirical Orthogonal Functions -- Part 1 and Part 2 -- plus mapping with Matplotlib
and Cartopy
and singular value decomposition on the covariance matrix vs. the data matrix.
September 21 to September 28 - EOF significance, meaningfulness, and interpretation, with short lectures on missingness and randomness.
September 28 to October 10 - EOF interpretation and orthogonal rotation
October 17 to October 26 - Analysis of coupled fields, field correlation, an alternative field correlation approach if you need local significance levels, compositing, and field significance and false discovery.
October 31 to November 7 - Spectral analysis, simple periodic signals and auto-correlation, and singular spectrum analysis
November 9 to November 14 - Frameworks for spatiotemporal data analysis
November 14 to December 5 - Student project work and presentations
My own programming career started in FORTRAN and moved to MATLAB, a language I've now spent almost 25 years using effectively and (mostly) without complaint. But with an increasing number of jobs for earth and environmental sciences student outside academia and the rise of Python as the de facto language of data science, I've decided to migrate this class from MATLAB to Python in 2023 (my own research code is also moving, more slowly, in this direction). This had involved some growing pains (for me!), but in the end I hope that the chance to learn spatiotemporal statistics in a language so widely used across so many fields will be worth the extra trouble for the students who take the class. For earth scientists relatively new to Python, Martin Trauth's book Python Recipes for Earth Sciences provides a useful and broad introduction solidly grounded in the types of analyses many of us are familiar with.
Anaconda is a package management software that downloads a number of packages for data analysis and exploration – including base Python – but is quite large. Since not all packages are always required, a 'lite' version of Anaconda is also available called Miniconda. Miniconda gives you base Python and allows for all the Anaconda management functions, but has a much smaller initial download size and installation time because it installs few packages (which means you'll need to install some packages not included in the installation). Once installed, both Anaconda or Miniconda will be referred to (and called from the shell, terminal, or command line) simply as conda
. A cheatsheet of conda
commands can be found here.
I personally use Anaconda, but instructions for installing via either are available in the following links:
This page from DataCamp contains useful and straightforward information on getting Python installed on both Windows and Mac.
This page from Notable.io also provides simple instructions for getting up and running in Python and Jupyter notebooks (note that this website might not be available in the near future).
This Youtube video from Visual Studio Code (the integrated coding environment we'll use in this class) can get you up and running pretty quickly. They show installation in Windows, so macOs will be slightly different. We'll also go this live in class on August 24th.
Here are the basic steps from the video:
- Install Python using Miniconda (recommended for this class: https://docs.conda.io/en/main/miniconda.html) or the full individual Anaconda distribution (https://www.anaconda.com/download) for your operating system. Both are free. Note that if you have an older operating system, the current versions of Miniconda might not work on your system. A simple comparison of the benefits and drawbacks of Anaconda vs. Miniconda can be found here.
- For Windows users: from the newly installed Conda prompt or from within Python install iPython.
- Install Juypter Notebooks: https://jupyter.org/install
- Install Visual Studio Code itself (https://code.visualstudio.com/download) for your system
- Within Visual Studio Code, install the Python extensions (from Microsoft)
- Test your system
Although not strictly required for this course, I encourage you to use the capacity of Git and Github to streamline your access to and use of the notebooks created for this class, as well as advance your own development of reproducible and readily shareable code. Here are some good places to start:
- Software Carpentry's Version Control with Git
- Jonathan King's Github Tutorial. Jonathan took this course as a graduate student.
- Create a virtual environment for the course and provide additional instruction on using conda environments for Python programming and analysis
- Make course materials available via a link in Google CoLaboratory
- Update introductory notebooks with additional examples of specific programming challenges and staks encountered in the class (based on Fall 2023 feedback and observations)
- Modify and refine Python homework exercises based on Fall 2023 feedback and lessons learned
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 License (CC BY-NC 3.0 US).
You are welcome to use any of this material, so long as it is for non-commercial purposes.
This course is designed to open up statistical black boxes and reveal the 'big pile of linear algebra' inside. But of course there are many existing packages that can perform the analyses done in this class and extensions of these in additional dimensions and other applications. Here are just a few you might find useful:
- cf-array - a wrapper for using CF attributes on xarray objects
- scikit-learn - has methods for PCA, CCA, imputation, independent component analysis (ICA), and much much more
- xeofs - being actively developed by Niclas Rieger and as of this writing (November 2023) showing active growth and development of basic EOF methods, multiple field EOF (including MCA), and rotated EOF (see more here: https://xeofs.readthedocs.io/en/latest/)
- Pyleoclimate - extensive package for the analysis of paleoclimate data, including code for the false discovery rate
- Geopandas - extends Pandas data types to work with vector geospatial data (including Shapefiles) with operations similar to those available in GIS software
- fiona - library for reading and writing GIS data formats
- rasterio - read and write to raster data formats like GeoTIFF
- Xee - is an Xarray extension for Google Earth Engine
- Nitime - built for time series analysis in neuroscience, has useful spectral analysis functions like multitaper method
- multitaper - multitaper spectral methods in Python (article here)
- mcssa - univariate Monte Carlo singular spectrum analysis in Python
- pymssa - (multichannel) singular spectrum analysis in Python
- ecopy - includes methods for PCA, Correspondence Analysis, and ordination
- GeoCAT - in theory a port of the NCAR Command Language to Python
Did you find this course material useful? Want to share ideas? Find some bugs? Feel free to contact me at [email protected]