-
Notifications
You must be signed in to change notification settings - Fork 1
/
weatherdata.Rmd
459 lines (306 loc) · 25 KB
/
weatherdata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
## Weather data
The weather data is a time series data, data taken at discrete time intervals. To show an example of a time series data, we plot the observed 2 meter temperature, called T2Obs, in degree Kelvin for the years from 2011 to 2015. As we know, the observed temperature shows a seasonal pattern, i.e, varying from high temperatures in Summer to low temperatures in Winter and also a diurnal pattern varying according to the hours of the day.
```{r, warning=FALSE, message=F, echo=TRUE}
library(ggplot2)
timeseries <- read.table(file = "data/timeseries_example.csv", sep = ",", header = TRUE)
timeseries$timestamp = as.Date(strptime(timeseries$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(timeseries$timestamp, timeseries$T2Obs, xlab="year", ylab="Observed temperature (Kelvin)")
```
## The description of the weather data
The weather data we use in this assignment is based on ECMWF’s 10-day forecasts for the Kumpula station (Station id 2998). There are 5 station and time related variables (station_id, season, analysis_time, forecast period and timestamp) and 25 predictor variables which describe, for example, temperature, dew point, relative humidity forecasts etc. We have also two response variables, the actual observations of the two-meter temperature denoted as T2_Obs and the dew point temperature D2 at 2 meters named as D2Obs. The ECMWF 10-day forecast includes forecasts for T2 and D2. We have the weather data from 2011-12-01 to 2015-10-01.
The T2Obs and D2Obs are available form [FMI opendata portal] (https://en.ilmatieteenlaitos.fi/open-data). We use the model data from the [Atmospheric Model high resolution 10-day forecast (HRES)](http://www.ecmwf.int/en/forecasts/datasets/set-i#I-i-a_fc) from ECMWF.
The variables of the raw forecast model from ECMWF (for the Single level -forecast) are defined [here](http://www.ecmwf.int/en/forecasts/datasets/set-i#I-i-a_fc). The definitions of the predictor variables we use are given below.
```{r table2, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
|No.| Abbreviation | Description |
|----|--------------|---------------|
| 1 | station_id | ECMWF id for the stations (Check) |
|2 | season | Winter, Spring, Summer and Autumn( 1-4) (Check) |
|3 | analysis_time| (Check) OOUTC and 12UTC (Coordinated Universal Time) |
|4 | forecast_period| Forecast period. (Check) |
|5 |timestamp| Timestamps 2011-12-06 12:00:00 to 2015-10-01 12:00:00|
|6 | T2_Obs| Observed temperature at 2 meters |
|7 | MSL | Mean sea level pressure |
|8| T2 | 2 meter temperature (ECMWF:2T) |
|9| D2 | 2 meter dewpoint temperature (ECMWF: 2D) |
|10 |U10 |10 meter U-velocity (ECMWF: 10U) |
|11 |V10 |10 meter V-velocity (ECMWF: 10V) || | | |
|12 | TP| Total precipitation |
|13 | LCC| Low cloud cover|
|14 | MCC| Medium cloud cover|
|15 | HCC| High cloud cover|
|16 | SKT| Skin temperature (temperature on the surface of the Earth) |
|17 |FG10_3 | 10 meter wind gust in the last 3 hours (ECMWF:10FG3|
|18 | MX2T3|Maximum temperature at 2 meter in the last 3 hours |
|19 | RH_950| Relative humidity at 950hPa|
|20 | T_950|Temperature at 950hPa|
|21 | RH_925| Relative humidity at 925hPa|
|22 | T_925|Temperature at 925hPa|
|23 | RH_850| Relative humidity at 850hPa|
|24 | T_850|Temperature at 850hPa|
|25 | RH_700| Relative humidity at 700hPa|
|26 | T_700|Temperature at 700hPa|
|27 | RH_500| Relative humidity at 500hPa|
|28 | T_500|Temperature at 500hPa|
|29 | T2_M1| 2 meter temperature at previous forecast period|
|30| T_950_M1| Temperature at 950hPa at previous forecast period|
|31| T_925_M1| Temperature at 925hPa at previous forecast period|
|32 |DECLINATION | Declination|
|33 | D2_M1| 2 meter dew point temperature at previous forecast period|
|34 | D2_Obs| Observed dew point temperature at 2 meters |
"
cat(tabl) # output the table in a format good for HTML/PDF/docx conversion
```
## Loading the data
The original data file is in "station2998_all_data_with_timestamps.csv". It has 299520 observations of 34 variables. Unfortunately the size of this file exceeds the GitHub's recommended maximum file size, I cannot upload it to my GitHub directory. The file has missing values in it, represented by "NA". We remove the variables with a large number of NAs (no. of NAs > no. of observations/4). Two variables "FG10_3" (10 meter wind gust in the last 3 hours) and "MX2T3" (Maximum temperature at 2 meter in the last 3 hours) are removed as they have large number of NAs. We remove the variables of constant variance, but this file has no variable with constant variance (except the station_id and it is not removed). We also remove the NAs from the other 32 variables. The program, create_weather_data.R does the above functions and save the "cleaned" data in the file "data/station2998_T2_D2_Obs_model_data_with_timestamps.csv"
The "cleaned" weather data, T2_D2_Obs_model_data_with_timestamps, is loaded from the file "data/station2998_T2_D2_Obs_model_data_with_timestamps.csv". Weather data is a multidimensional data. It has 290982 observations of 32 variables. The first five columns of the weather data are the station_id, season, analysis_time, forecast period and timestamps. We choose the Kumpula station which has a station id of 2998. We have data for all the four seasons, namely, Winter, Spring, Summer and Autumn represented by the values 1,2,3, and 4 respectively.
We are using 10-day forecasts, we have 2 to 65 different forecast periods ranging from +0 to +240. Between +0 and +144, the step size is 3 hours, and between +144 and +240, the step size is 6 hours. When the forecast period has a value 2, we get a 3-hour forecast data and if its value is 8, we get a 24-hour or one day forecast data. The forecast period with value 65 represents a 10-day forecast. The forecasts are created two times a day at 00:00 UTC (Coordinated Universal Time) and 12:00 UTC. This variable is called analysis time has two values 1 and 2 corresponding to 00:00UTC and 12:00 UTC.
The response variable T2Obs is at column 6 and the other response variable D2Obs is the last column of the weather data. The columns 8 and 9 gives the ECMF model output for T2 and D2. From the ECMWF model we also have the data for temperature and relative humidity at various pressure levels such as 950 hPa(hecto Pascal), 925 hPa, 850 hPa, 700 hPa and 500 hPa. We also have data on temperature T2_M1, T_950_M1 (temperature at 950 hPa), T_925_M1 and dew point temperature D2_M1 of the corresponding variables at the previous forecast period.
ECMWF model also provides the temperature at the surface of Earth called skin temperature (SKT), wind velocity at 10 meters (U_10 and V_10). Declination data is available and it is defined as the angular distance north or south from the celestial equator measured along a great circle passing through the celestial pole.
we load the weather data, T2_D2_Obs_model_data_with_timestamps and below can see the structure of the data.
```{r, warning=FALSE, message=F, echo=TRUE}
# loading the data from the file
T2_D2_Obs_model_data_with_timestamps = read.table(file = "data/station2998_T2_D2_Obs_model_data_with_timestamps.csv", sep = ",", header = TRUE)
dim(T2_D2_Obs_model_data_with_timestamps)
colnames(T2_D2_Obs_model_data_with_timestamps)
str(T2_D2_Obs_model_data_with_timestamps)
```
## Summary of the weather data set
We can find the minimum, maximum, and quantile values of the weather data for the columns 6:32 is shown here. We can see that the T2Obs has a mean value of 280K and varies from a minimum value of 247.7 K to a maximum of 303.6K. D2Obs has a mean value of 280K and varies from 244.8K to a maximum of 293.1K
```{r, warning=FALSE, message=F, echo=TRUE}
# summary of the weather dataset
summary(T2_D2_Obs_model_data_with_timestamps[,-c(1:5)])
```
## Visualizing the data
Here we plot the histogram of the Observed temperature at 2 m, Temp at 2 m - ECMWF model and Temp at 2 meters - previous forecast period. We can also see Observed dew point temperature at 2 m, dew point temp at 2 m - ECMWF model and dew point temp at 2 meters - previous forecast period. Unfortunately I am not able to reduce the font size of the title, so the full titles are not shown in the figures.
```{r, warning=FALSE, message=F, echo=TRUE}
library(ggplot2)
library(gridExtra)
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$T2Obs, geom= "histogram", binwidth = 2, xlab = "T2Obs",
fill = I("red3"), col = I("gray"), main="Observed temp at 2 m")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$T2, geom= "histogram", binwidth = 2, xlab = "T2",
fill = I("tomato"), col = I("gray"), main = "Temp at 2 m - ECMWF model")
p3 <- qplot(T2_D2_Obs_model_data_with_timestamps$T2_M1, geom= "histogram", binwidth = 2, xlab = "T2_M1",
fill = I("hotpink"), col = I("gray"), main = "Temp at 2 meters - previous forecast period")
p4 <- qplot(T2_D2_Obs_model_data_with_timestamps$D2Obs, geom= "histogram", binwidth = 2, xlab = "D2Obs",
fill = I("blue"), col = I("gray"), main = "Observed Dewpoint temp at 2 m")
p5 <- qplot(T2_D2_Obs_model_data_with_timestamps$D2, geom= "histogram", binwidth = 2, xlab = "D2",
fill = I("dodgerblue"), col = I("gray"), main = "Dewpoint temp at 2 m -ECMWF model")
p6 <- qplot(T2_D2_Obs_model_data_with_timestamps$D2_M1, geom= "histogram", binwidth = 2, xlab = "D2_M1",
fill = I("deepskyblue"), col = I("gray"), main = "Dewpoint temp at 2 m -previous forecast period")
grid.arrange(p1, p2, p3, p4,p5,p6, ncol = 3, nrow = 2)
```
```{r, warning=FALSE, message=F, echo=TRUE}
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$U10, geom= "histogram", binwidth = 2, xlab = "U10",
fill = I("yellow"), col = I("gray"), main="10 meter U-velocity")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$V10, geom= "histogram", binwidth = 2, xlab = "V10",
fill = I("yellowgreen"), col = I("gray"), main = "10 meter U-velocity")
grid.arrange(p1, p2, ncol = 2, nrow = 1)
```
```{r, warning=FALSE, message=F, echo=TRUE}
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$TP, geom= "histogram", binwidth = 2, xlab = "TP",
fill = I("bisque"), col = I("white"), main="Total precipitation")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$LCC, geom= "histogram", binwidth = 2, xlab = "LCC",
fill = I("gray60"), col = I("white"), main = " Low cloud cover")
p3 <- qplot(T2_D2_Obs_model_data_with_timestamps$MCC, geom= "histogram", binwidth = 2, xlab = "MCC",
fill = I("gray45"), col = I("white"), main = " Medium cloud cover")
p4 <- qplot(T2_D2_Obs_model_data_with_timestamps$HCC, geom= "histogram", binwidth = 2, xlab = "HCC",
fill = I("black"), col = I("white"), main = "High cloud cover")
grid.arrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
```
```{r, warning=FALSE, message=F, echo=TRUE}
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$RH_950, geom= "histogram", binwidth = 2, xlab = "RH_950",
fill = I("forestgreen"), col = I("gray"), main="Relative humidity at 950 hPa")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$RH_925, geom= "histogram", binwidth = 2, xlab = "RH_925",
fill = I("chartreuse"), col = I("gray"), main="Relative humidity at 925 hPa")
p3 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_950, geom= "histogram", binwidth = 2, xlab = "T_950",
fill = I("blueviolet"), col = I("gray"), main = "T2: Temperature at 950 hPa")
p4 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_925, geom= "histogram", binwidth = 2, xlab = "T_925",
fill = I("orchid2"), col = I("gray"), main = "T2: Temperature at 925 hPa")
grid.arrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
```
```{r, warning=FALSE, message=F, echo=TRUE}
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$RH_850, geom= "histogram", binwidth = 2, xlab = "RH_850",
fill = I("brown4"), col = I("gray"), main="Relative humidity at 850 hPa")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$RH_700, geom= "histogram", binwidth = 2, xlab = "RH_700",
fill = I("brown1"), col = I("gray"), main="Relative humidity at 700 hPa")
p3 <- qplot(T2_D2_Obs_model_data_with_timestamps$RH_500, geom= "histogram", binwidth = 2, xlab = "RH_500",
fill = I("darksalmon"), col = I("gray"), main="Relative humidity at 500 hPa")
p4 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_850, geom= "histogram", binwidth = 2, xlab = "T_850",
fill = I("hotpink4"), col = I("gray"), main = "T2: Temperature at 850 hPa")
p5 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_700, geom= "histogram", binwidth = 2, xlab = "T_700",
fill = I("hotpink1"), col = I("gray"), main = "T2: Temperature at 700 hPa")
p6 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_500, geom= "histogram", binwidth = 2, xlab = "T_500",
fill = I("thistle1"), col = I("gray"), main = "T2: Temperature at 500 hPa")
grid.arrange(p1, p2, p3, p4, p5, p6, ncol = 3, nrow = 2)
```
```{r, warning=FALSE, message=F, echo=TRUE}
p1 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_950_M1, geom= "histogram", binwidth = 2, xlab = "RH_700",
fill = I("purple"), col = I("gray"), main="Temperature at 950 hPa at previous forecast period")
p2 <- qplot(T2_D2_Obs_model_data_with_timestamps$T_925_M1, geom= "histogram", binwidth = 2, xlab = "RH_500",
fill = I("magenta"), col = I("gray"), main="Temperature at 925 hPa at previous forecast period")
p3 <- qplot(T2_D2_Obs_model_data_with_timestamps$SKT, geom= "histogram", binwidth = 2, xlab = "SKT",
fill = I("limegreen"), col = I("gray"), main="Skin temperature")
p4 <- qplot(T2_D2_Obs_model_data_with_timestamps$DECLINATION, geom= "histogram", binwidth = 2, xlab = "DECLINATION",
fill = I("turquoise"), col = I("gray"), main = "Declination")
grid.arrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
```
### Correlation matrix and Scatter plots
As our aim is to develop a linear regression model for T2 and D2, it will be a good idea to study the correlation between the variables of the weather data.
Below we can see the correlation between the variables of weather data. We describe in detail about the correlations when we go to the next section on the linear regression models.
```{r, warning=FALSE, message=F, echo=TRUE}
cor(as.matrix(T2_D2_Obs_model_data_with_timestamps[,6:ncol(T2_D2_Obs_model_data_with_timestamps)]))
library(corrplot); library(dplyr)
cor_matrix<-cor(T2_D2_Obs_model_data_with_timestamps[,6:ncol(T2_D2_Obs_model_data_with_timestamps)]) %>% round(digits=2)
# visualize the correlation matrix
corrplot(cor_matrix, method="number", type = "upper", cl.pos = "b", tl.pos = "d", tl.cex = 0.6)
```
### Scatter plots and distributions
When we try to plot the scatter plot for even winter data with all the variables, it was very difficult to visualize the data as we have 26 variables.
So we take a smaller set of variables such as "T2Obs", "MSL", "T2","D2", "U10","V10","TP", "LCC","MCC", "HCC","SKT","RH_950","T_950","T2_M1","T_950_M1","DECLINATION", "D2_M1","D2Obs" and do the scatter plot.
```{r, warning=FALSE, message=F, echo=TRUE}
library (ggplot2)
library (GGally)
df_winter_00_08 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 8 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1 & T2_D2_Obs_model_data_with_timestamps$season == 1),]
df_winter_00_08_small <- df_winter_00_08[, c(6, 7,8, 9,10, 11,12,13,14,15,16,17,18,27,28,30,31,32)]
ggpairs(df_winter_00_08_small[,1:ncol(df_winter_00_08_small)], mapping = aes(), lower = list(combo = wrap("facethist", bins = 20))) + ggtitle("Matrix of scatter plots and distributions of weather data")
```
## T2 : Air temperature at 2 meters
We plot below the T2Obs with the observations. We may not infer much from the plot other than a seasonal change in the observed temperature at 2 meters. So we separate the data into small chunks based on the seasons and the forecast periods.
```{r, warning=FALSE, message=F, echo=TRUE}
plot(T2_D2_Obs_model_data_with_timestamps$T2Obs, main = "Observed temperature at 2 meters", xlab = "Observations", ylab ="T2Obs")
```
### Data for analysis time 00UTC and forecast period 8
Below we can see the weather data for all the seasons where the analysis time in 00:00 UTC and for one-day forecast.
```{r, warning=FALSE, message=F, echo=TRUE}
df_00_08 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 8 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1),]
df_00_08$timestamp = as.Date(strptime(df_00_08$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_00_08$timestamp, df_00_08$T2Obs, xlab="year", ylab="T2Obs")
```
### Data for winter and analysis time 00UTC and forecast period 8
Below we can see the weather data for the winter season where the analysis time in 00:00 UTC and for one-day forecast.
```{r, warning=FALSE, message=F, echo=TRUE}
df_winter_00_08 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 8 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1 & T2_D2_Obs_model_data_with_timestamps$season == 1),]
df_winter_00_08$timestamp = as.Date(strptime(df_winter_00_08$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_winter_00_08$timestamp, df_winter_00_08$T2Obs, xlab="year", ylab="T2Obs")
```
### QQ plot of winter temp
This plot is interesting, we can see that the low temperatures do not have a linear variation.
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_winter_00_08$T2Obs); qqline(df_winter_00_08$T2Obs)
```
### QQ plot of winter ECMWF model T2
We can observe that the T2 from ECMWF model differs from the T2Obs.
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_winter_00_08$T2); qqline(df_winter_00_08$T2)
```
### Data for winter and analysis time 00UTC and forecast period 10 days
```{r, warning=FALSE, message=F, echo=TRUE}
df_winter_00_65 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 65 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1 & T2_D2_Obs_model_data_with_timestamps$season == 3),]
df_winter_00_65$timestamp = as.Date(strptime(df_winter_00_65$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_winter_00_65$timestamp, df_winter_00_65$T2Obs, xlab="year", ylab="T2Obs")
```
### Data for summer and analysis time 00UTC and forecast period 10 days
```{r, warning=FALSE, message=F, echo=TRUE}
df_summer_00_65 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 65 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 3 & T2_D2_Obs_model_data_with_timestamps$season == 3),]
df_summer_00_65$timestamp = as.Date(strptime(df_summer_00_65$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_summer_00_65$timestamp, df_summer_00_65$T2Obs, xlab="year", ylab="T2Obs")
```
### QQ plot of winter temp T2Obs 10 day forecast
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_winter_00_65$T2Obs); qqline(df_winter_00_65$T2Obs)
```
### QQ plot of winter temp T2 from ECMWF model for the 10 day forecast
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_winter_00_65$T2); qqline(df_winter_00_65$T2)
```
### Data for summer and analysis time 00UTC and forecast period 8
We also shows similar graphs for the summer temperatures.
```{r, warning=FALSE, message=F, echo=TRUE}
df_summer_00_08 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 8 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1 & T2_D2_Obs_model_data_with_timestamps$season == 3),]
df_summer_00_08$timestamp = as.Date(strptime(df_summer_00_08$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_summer_00_08$timestamp, df_summer_00_08$T2Obs, xlab="year", ylab="T2Obs")
```
### QQ plot of summer observed T2Obs
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_summer_00_08$T2Obs); qqline(df_summer_00_08$T2Obs)
```
### QQ plot of summer ECMWF model T2
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_summer_00_08$T2); qqline(df_summer_00_08$T2)
```
### Data for summer and analysis time 00UTC and forecast period 8
```{r, warning=FALSE, message=F, echo=TRUE}
library(ggplot2)
df_summer_00_08 <- T2_D2_Obs_model_data_with_timestamps[which(T2_D2_Obs_model_data_with_timestamps$forecast_period == 8 & T2_D2_Obs_model_data_with_timestamps$analysis_time == 1 & T2_D2_Obs_model_data_with_timestamps$season == 3),]
df_summer_00_08$timestamp = as.Date(strptime(df_summer_00_08$timestamp, "%Y-%m-%d%H:%M:%S"))
qplot(df_summer_00_08$timestamp, df_summer_00_08$T2Obs, xlab="year", ylab="T2Obs")
```
### QQ plot of summer observed T2Obs
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_summer_00_08$T2Obs); qqline(df_summer_00_08$T2Obs)
```
### QQ plot of summer ECMWF model T2
```{r, warning=FALSE, message=F, echo=TRUE}
qqnorm(df_summer_00_08$T2); qqline(df_summer_00_08$T2)
```
### T2Obs vs T2 from ECMWF model for Winter forecast period 08
Now we plot some graphs depicting the relation ship between T2Obs ad T2 from ECMWF for winter for one-day forecast. We can observe that at high temperatures there is a linear relationship than for the low values of T2Obs ad T2 from ECMWF. We used to method lm to plot this relationship.
```{r, warning=FALSE, message=F, echo=TRUE}
T2Obs <- df_winter_00_08$T2Obs
T2 <- df_winter_00_08$T2
ggplot(df_winter_00_08, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model T2") + ylab("T2Obs") + ggtitle("T2 observed vs ECMWF model T2 for Winter ") + geom_smooth(method = "lm")
```
#### D2Obs vs D2 from ECMWF model for Winter forecast period 08
A similar kind of graph we can observe between D2Obs and D2 from ECMWF for winter for one-day forecast
```{r, warning=FALSE, message=F, echo=TRUE}
D2Obs <- df_winter_00_08$D2Obs
D2 <- df_winter_00_08$D2
ggplot(df_winter_00_08, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model D2") + ylab("D2Obs") + ggtitle("D2 observed vs ECMWF model D2 for Winter") + geom_smooth(method = "lm")
```
#### T2Obs vs T2 from ECMWF model for Winter forecast period 65
we can observe that for 10-day forecast, the points of T2Obs and T2 from ECMWF model are more scattered. A similar observation is there between D2Obs and D2 from ECMWF model for the 10 day forecast.
```{r, warning=FALSE, message=F, echo=TRUE}
T2Obs <- df_winter_00_65$T2Obs
T2 <- df_winter_00_65$T2
ggplot(df_winter_00_65, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model T2") + ylab("T2Obs") + ggtitle("T2 observed vs ECMWF model T2 for Winter ") + geom_smooth(method = "lm")
```
#### D2Obs vs D2 from ECMWF model for Winter forecast period 65
```{r, warning=FALSE, message=F, echo=TRUE}
D2Obs <- df_winter_00_65$D2Obs
D2 <- df_winter_00_65$D2
ggplot(df_winter_00_65, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model D2") + ylab("D2Obs") + ggtitle("D2 observed vs ECMWF model D2 for Winter") + geom_smooth(method = "lm")
```
#### T2Obs vs T2 from ECMWF model for Summer forecast period 08
The following plots show the relation ship between T2Obs and T2 from ECMWF model and T2Obs and T2 from ECMWF model for one-day and 10 day forecasts.
```{r, warning=FALSE, message=F, echo=TRUE}
T2Obs <- df_winter_00_08$T2Obs
T2 <- df_summer_00_08$T2
ggplot(df_summer_00_08, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model T2") + ylab("T2Obs") + ggtitle("T2 observed vs ECMWF model T2 for Summer") + geom_smooth(method = "lm")
```
#### D2Obs vs D2 from ECMWF model for Summer forecast period 08
```{r, warning=FALSE, message=F, echo=TRUE}
D2Obs <- df_winter_00_08$D2Obs
D2 <- df_summer_00_08$D2
ggplot(df_summer_00_08, aes(x = T2Obs, y = D2)) + geom_point() + xlab("ECMWF model D2") + ylab("D2Obs") + ggtitle("D2 observed vs ECMWF model D2 for Summer") + geom_smooth(method = "lm")
```
#### T2Obs vs T2 from ECMWF for Summer forecast period 65
```{r, warning=FALSE, message=F, echo=TRUE}
T2Obs <- df_winter_00_65$T2Obs
T2 <- df_summer_00_65$T2
ggplot(df_summer_00_65, aes(x = T2Obs, y = T2)) + geom_point() + xlab("ECMWF model T2") + ylab("T2Obs") + ggtitle("T2 observed vs ECMWF model T2 for Summer") + geom_smooth(method = "lm")
```
#### D2Obs vs D2 from ECMWF for Summer forecast period 65
```{r, warning=FALSE, message=F, echo=TRUE}
D2Obs <- df_winter_00_65$D2Obs
D2 <- df_summer_00_65$D2
ggplot(df_summer_00_65, aes(x = T2Obs, y = D2)) + geom_point() + xlab("ECMWF model D2") + ylab("D2Obs") + ggtitle("D2 observed vs ECMWF model D2 for Summer") + geom_smooth(method = "lm")
```
#### The difference in T2Obs and T2 from ECMWF model for all the forecast periods and for all seasons.
Now we plot quantiles of the difference in T2Obs and T2 from ECMWF model for all the forecast periods. We Can see that as the length of the forecast period increases, the variation in the quantiles increases both for high temperatures and low temperatures.
```{r, warning=FALSE, message=F, echo=TRUE}
T2Obs_T2 <- T2_D2_Obs_model_data_with_timestamps$T2Obs - T2_D2_Obs_model_data_with_timestamps$T2
df_T2Obs_T2 <- cbind(T2_D2_Obs_model_data_with_timestamps, T2Obs_T2)
colnames(df_T2Obs_T2[ncol(df_T2Obs_T2)]) <- "T2Obs_T2"
ggplot(df_T2Obs_T2, aes(x = factor(df_T2Obs_T2$forecast_period), y = df_T2Obs_T2$T2Obs_T2))+ xlab("forecast_period(2-65)") + ylab("T2Obs - T2") + ggtitle("Difference in the Observed temperature and the temperature from ECMWF model") + geom_boxplot(width = 1, colour="blue")
```