-
Notifications
You must be signed in to change notification settings - Fork 1
/
Read_NetCDF_tutorial_v3.Rmd
190 lines (149 loc) · 9.26 KB
/
Read_NetCDF_tutorial_v3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# How to open and work with NetCDF data in R
In this tutorial we will open some geospatial data that is stored in a netCDF file. We will select the variable and time range of interest, and export the data to a GeoTIFF so that we can continue the analysis in R or other geospatial software.
## 1. Open and read an example NetCDF dataset
First, we need some data to play with. As an example, we will use some data on trends in vegetation greenness in the Arctic created by Guay et al.
The data citation is:
Guay, K.C., P.S.A. Beck, and S.J. Goetz. 2015. Long-Term Arctic Growing Season NDVI Trends from GIMMS 3g, 1982-2012. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1275
This dataset provides normalized difference vegetation index (NDVI) data for the Arctic growing season derived primarily with data from Advanced Very High Resolution Radiometer (AVHRR) sensors onboard several NOAA satellites over the years 1982 through 2012. The NDVI data, which show vegetation activity, were averaged annually for the arctic growing season (June, July and August) in each year. The data are circumpolar in coverage at 8-km resolution and limited to greater than 20 degrees N.
Specifically, we will use the file "gimms3g_ndvi_1982-2012.nc4" from this dataset. The netcdf file is available from the ORNL DAAC at the following link: https://daac.ornl.gov/daacdata/global_vegetation/GIMMS3g_NDVI_Trends/data/gimms3g_ndvi_1982-2012.nc4
You will need to log in using your NASA Earthdata login. Remember to place the file into the same working directory where your R script is located.
## Load the required R packages
Reading data stored in netCDF files is quite straightforward in R, provided that you load the appropriate packages. You may need to install these packages if you have not used them before.
```{r load_packages, warning=FALSE, message=FALSE}
library(ncdf4) # package for netcdf manipulation
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(ggplot2) # package for plotting
```
## Read in the netCDF file contents
Use nc_open to read the data into a data structure I called nc_data. Print the metadata about this file to a text file.
```{r in_data}
# open the file
nc_data <- nc_open('gimms3g_ndvi_1982-2012.nc4')
# save the print(nc) dump to a text file
{
sink('gimms3g_ndvi_1982-2012_metadata.txt')
print(nc_data)
sink()
}
```
From this output we see that there are two variables: time_bnds, which contains the start and end date of each observation, and NDVI, which is the variable of interest. NDVI has three dimensions: [lon,lat,time].
There are four dimensions in the file: lat, lon, time, and "nv", which is used to record the beginning and end of the time range. We can ignore "nv" and focus on the three dimensions that are used to organize the NDVI data.
There are 10 global attributes which provide metadata information about the file.
We need to capture these data in the lat, lon, and time dimensions. The following code reads the latitudes, longitudes, and time of each NDVI observation and saves them in memory.
```{r dimensions}
# get the lat, lon, and time coordinates
lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat", verbose = F)
t <- ncvar_get(nc_data, "time")
head(lon) # look at the first few entries in the longitude vector
```
Read in the data from the NDVI variable and verify the dimensions of the array. There should be 4320 lons, 840 lats, and 31 times
```{r read_NDVI}
ndvi.array <- ncvar_get(nc_data, "NDVI") # store the data in a 3-dimensional array
dim(ndvi.array) # print the dimensions to the screen
```
Other pertinent information about the NDVI variable:
Lets's see what fill value was used for missing data.
```{r fillVal}
fillvalue <- ncatt_get(nc_data, "NDVI", "_FillValue") # get the fill value
fillvalue # print to screen
```
The fill value is -9999.
All done reading in the data. We can close the netCDF file.
```{r close_nc}
nc_close(nc_data)
```
## Working with the data
So, now we have the entire array of NDVI values for 4320 x 840 grid cells over each of 31 years in R. What can we do with it?
First, a little housekeeping. Let's replace all those pesky fill values with the R-standard 'NA'.
```{r housekeeping}
ndvi.array[ndvi.array == fillvalue$value] <- NA
```
Let's get one year of the NDVI data and plot it.
Time is the third dimension of the "ndvi.array". The first time slice represents the growing season of 1982.
```{r one_year}
ndvi.slice <- ndvi.array[, , 1] # get the first time slice
```
Just to make sure everything is working correctly, we can take a look at the dimensions of this time slice. The dimensions should be 4320 longitudes by 840 latitudes.
```{r check_dim}
dim(ndvi.slice) # print dimensions to screen
```
Ok, everything checks out, so we can go ahead and save this data in a raster. Note that we provide the coordinate reference system "CRS" in the standard well-known text format. For this data set, it is the common WGS84 system.
```{r rasterize}
# save this time slice as a raster
r <- raster(t(ndvi.slice), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
```
We will need to transpose and flip to orient the data correctly. The best way to figure this out is through trial and error, but remember that most netCDF files record spatial data from the bottom left corner.
```{r raster_orient}
r <- flip(r, direction='y')
```
## Plotting
Finally, we can plot the raster to take a look at the NDVI in 1982. Remember this data is cut off below 20 degrees North.
```{r plot_raster}
plot(r)
dev.copy(png,'GIMMS3g_1982_NDVI.png')
dev.off()
```
![Arctic Growing Season NDVI, 1982](GIMMS3g_1982_NDVI.png)
## Saving to a GeoTIFF
If this looks good, then let's save it to a GeoTIFF file.
```{r save_raster}
writeRaster(r, "GIMMS3g_1982.tif", "GTiff", overwrite=TRUE)
```
## 2. Extract data at a study site
Maybe you want to get a timeseries of NDVI at a study location, such as the Toolik Lake Field Station in Alaska (Latitude: 68.6275, Longitude: -149.5975).
First, we will need to convert the entire 3d array of data to a raster brick. **Note, this step may take several minutes.**
```{r rasterize_brick}
r_brick <- brick(ndvi.array, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
# note that you may have to play around with the transpose (the t() function) and flip() before the data are oriented correctly. In this example, the netcdf file recorded latitude on the X and longitude on the Y, so both a transpose and a flip in the y direction were required.
r_brick <- flip(t(r_brick), direction='y')
```
Extract timeseries of data at the Toolik Lake study location from the raster brick using the 'extract()' function.
```{r get_Toolik}
toolik_lon <- -149.5975
toolik_lat <- 68.6275
toolik_series <- extract(r_brick, SpatialPoints(cbind(toolik_lon,toolik_lat)), method='simple')
```
This timeseries is in a simple vector indexed only by the raster layer ID, so let's put it in an easier-to-use dataframe form and then plot the timeseries.
```{r plot_Toolik}
toolik_df <- data.frame(year= seq(from=1982, to=2012, by=1), NDVI=t(toolik_series)) # save as a dataframe
# plot and save
ggplot(data=toolik_df, aes(x=year, y=NDVI, group=1)) +
geom_line() + # make this a line plot
ggtitle("Growing season NDVI at Toolik Lake Station") + # Set title
theme_bw() # use the black and white theme
ggsave('GIMMS3g_Toolik.png')
```
![Growing season NDVI at Toolik Lake Station](GIMMS3g_Toolik.png)
Wow! The Toolik Lake site really is getting greener over time.
## 3. Difference in NDVI between two time periods
The authors of this dataset identified long-term trends in NDVI over the 31 year period from 1982 to 2012. Their paper in Global Change Biology found some interesting patterns:
You can check out the paper here:
Guay, K. C., Beck, P. S. A., Berner, L. T., Goetz, S. J., Baccini, A. and Buermann, W. (2014), Vegetation productivity patterns at high northern latitudes: a multi-sensor satellite data assessment. Glob Change Biol, 20: 3147-3158. https://doi.org/10.1111/gcb.12647
Let's look at the difference in NDVI between 1982 and 2012.
The 'ndvi.slice' array has the data from 1982. Let's get the data from 2012, the 31st time slice.
```{r 2012_data}
ndvi.slice.2012 <- ndvi.array[, , 31] # get the 31st time slice
```
Now take the difference between them.
```{r difference}
ndvi.diff <- ndvi.slice.2012 - ndvi.slice
```
Save the difference map as a raster.
```{r rasterize_diffs}
r_diff <- raster(t(ndvi.diff), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
```
Re-orient the data for geotiff.
```{r raster_orient_diffs}
r_diff <- flip(r_diff, direction='y')
```
And plot.
```{r plot_raster_diffs}
plot(r_diff)
dev.copy(png,'GIMMS3g_NDVI_diff.png')
dev.off()
```
![Difference between growing season NDVI in 2010 vs 1982](GIMMS3g_NDVI_diff.png)
The areas that were greener in 2012 compared to 1982 are represented by positive numbers (green in this color scheme). Note that this is not a proper timeseries analysis (like the authors did in their paper), it is only showing the difference between two particular years.
Congratulations, now you know how to open and read data from a netCDF file.