-
Notifications
You must be signed in to change notification settings - Fork 0
/
Seasonal Difference Exploration_revised.Rmd
165 lines (138 loc) · 8.96 KB
/
Seasonal Difference Exploration_revised.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "Seasonal Difference Exploration"
author: "Hao Zheng (hz2770)"
date: "5/6/2022"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(kableExtra)
```
## Explore the seasonal differences
```{r}
load("./dt_long.RData")
load("./ID_in.RData")
load("./beta.res.postmean.RData")
dt_season <-
dt_long %>%
drop_na() %>%
filter(ID %in% ID_in) %>%
distinct(ID, .keep_all = TRUE) %>%
dplyr::select(ID, Season, Month, Nature) %>%
mutate(Month = factor(Month, levels = month.name))
```
```{r}
season_diff <-
merge(dt_season, beta.res.postmean, by = c("ID")) %>%
janitor::clean_names()
colnames(season_diff)[2] <- "year"
# Beta0
intercept.fit <- lm(intercept ~ month + year + nature, data = season_diff)
# Beta1
wind_prev.fit <- lm(wind_prev ~ month + year + nature, data = season_diff)
# Beta2
lat_change.fit <- lm(lat_change ~ month + year + nature, data = season_diff)
# Beta3
long_change.fit <- lm(long_change ~ month + year + nature, data = season_diff)
#Beta4
wind_change.fit <- lm(wind_change ~ month + year + nature, data = season_diff)
summary(intercept.fit)
summary(wind_prev.fit)
summary(lat_change.fit)
summary(long_change.fit)
summary(wind_change.fit)
```
```{r, results = "asis"}
sum0 <- summary(intercept.fit)$coefficients[,c(1,4)]
sum1 <- summary(wind_prev.fit)$coefficients[,c(1,4)]
sum2 <- summary(lat_change.fit)$coefficients[,c(1,4)]
sum3 <- summary(long_change.fit)$coefficients[,c(1,4)]
sum4 <- summary(wind_change.fit)$coefficients[,c(1,4)]
total_sum = cbind(sum0, sum1, sum2, sum3, sum4)
save(total_sum, file = "./tables/total_sum.RData")
kable(cbind(sum0, sum1, sum2, sum3, sum4)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
add_header_above(c(" " = 1, "$\\beta_0$" = 2, "$\\beta_1$" = 2, "$\\beta_2$" = 2, "$\\beta_3$" = 2, "$\\beta_4$" = 2))
```
Now based on the estimated Bayesion model from previous questions, we need to explore the seasonal difference. We can fit 5 models using 5 estimated beta values against the three predictors: $X_{i,1}$: the month of the year the ith hurricane started, $X_{i,2}$:the year of the ith hurricane and $X_{i,3}$: the nature of the ith hurricane. The beta values obtained from previous Gibbs Sampler MCMC method contains the mean value of $\beta_{0,i}$, $\beta_{1,i}$, $\beta_{2,i}$, $\beta_{3,i}$ and $\beta_{4,i}$ for each of the 697 unique hurricanes, which is of the size 697 * 5.
According to the summary, the R squared value for all the five fitted linear models are quite small, which may indicate bad fit. In addition, most coefficients for the model are not significant with a p-value larger than 0.05. However, for those significant coefficients, we could infer a potential relationship between the certain predictors and the beta coefficients respectively. We should consult the previous Bayesion model:
$$Y_{i}(t+6) =\beta_{0,i}+\beta_{1,i}Y_{i}(t) + \beta_{2,i}\Delta_{i,1}(t)+
\beta_{3,i}\Delta_{i,2}(t) +\beta_{4,i}\Delta_{i,3}(t) + \epsilon_{i}(t)$$
to interpret the change of the influence on $Y_{i,t+6}$ as the value of the predictor changes.
For the fitted coefficients of $\beta_{0}$ to $\beta_{4}$, the intercept cannot show information about seasonal difference since they indicate when holding all the predictors zero, the value for the corresponding $\beta$. We can only observe that the year is quite significant in the model for $\beta_{0}$, $\beta_{1}$ with both negative estimates close to zero. Therefore, as the year increase, the coefficient of the intercept and $Y_{i,t}$ may decrease a little, which means for the Bayesian model, the wind speed when holding all the variables zero and the effect the previous wind speed has will decrease over years. Apart from seasonal difference, some other predictors are quite significant, such as natureET for $\beta_{2}$, natureTS for $\beta_{3}$.
```{r}
# Try to fit the beta model only with the four seasons
season_diff <- as.data.frame(season_diff) %>%
mutate(month = recode(month, April = "Spring"),
month = recode(month, May = "Spring"),
month = recode(month, June = "Summer"),
month = recode(month, July = "Summer"),
month = recode(month, August = "Summer"),
month = recode(month, September = "Autumn"),
month = recode(month, October = "Autumn"),
month = recode(month, November = "Autumn"),
month = recode(month, December = "Winter"),
month = recode(month, January = "Winter"),
month = factor(month, levels = c("Spring", "Summer", "Autumn", "Winter")))
colnames(season_diff)[3] <- "season"
# Beta0
intercept.fit.2 <- lm(intercept ~ season, data = season_diff)
# Beta1
wind_prev.fit.2 <- lm(wind_prev ~ season, data = season_diff)
# Beta2
lat_change.fit.2 <- lm(lat_change ~ season, data = season_diff)
# Beta3
long_change.fit.2 <- lm(long_change ~ season, data = season_diff)
# Beta4
wind_change.fit.2 <- lm(wind_change ~ season, data = season_diff)
sum0_2 <- summary(intercept.fit.2)$coefficients[,c(1,4)]
sum1_2 <- summary(wind_prev.fit.2)$coefficients[,c(1,4)]
sum2_2 <- summary(lat_change.fit.2)$coefficients[,c(1,4)]
sum3_2 <- summary(long_change.fit.2)$coefficients[,c(1,4)]
sum4_2 <- summary(wind_change.fit.2)$coefficients[,c(1,4)]
season_sum = as.data.frame(rbind(sum0_2, sum1_2, sum2_2, sum3_2, sum4_2))
season_sum = season_sum %>% mutate(
coefficient = gsub("\\.", "", gsub(".\\d","",gsub("X.","",rownames(season_sum)))),
response = c(rep("$\\beta_0$",4), rep("$\\beta_1$",4), rep("$\\beta_2$",4), rep("$\\beta_3$",4), rep("$\\beta_4$",4))
) %>% dplyr::select(c("response","coefficient",everything()))
row.names(season_sum) = NULL
save(season_sum, file = "./tables/season_sum.RData")
kable(season_sum, digits = 3, escape = F)
```
We also try to represent the months as four seasons and fit a model for $\beta$ against them. Each model has three dummy variables corresponding to the three seasons except Spring. The latter three rows of estimate shows how the value of $\beta$ differentiate between Spring and the other three seasons respectively. If with a rather small p-value, we can conclude the existence of seasonal difference. Therefore, by constructing model in this way, we find that $\beta_1$ and $\beta_4$ will increase a little as season changes from Spring to Summer, then to Autumn, which means a season difference of the effect $Y_{i,t}$ and $\Delta_{i,3}(t)$ has on the wind speed. For $\beta_2$, $\beta_3$, Summer and Autumn may lead to a slightly smaller effect of $\Delta_{i,1}(t)$, $\Delta_{i,2}(t)$ have on the wind speed compared to Spring.
```{r}
# Try to fit the beta model only with the year
# Beta0
intercept.fit.new <- lm(intercept ~ year, data = season_diff)
# Beta1
wind_prev.fit.new <- lm(wind_prev ~ year, data = season_diff)
# Beta2
lat_change.fit.new <- lm(lat_change ~ year, data = season_diff)
# Beta3
long_change.fit.new <- lm(long_change ~ year, data = season_diff)
#Beta4
wind_change.fit.new <- lm(wind_change ~ year, data = season_diff)
summary(intercept.fit.new)
summary(wind_prev.fit.new)
summary(lat_change.fit.new)
summary(long_change.fit.new)
summary(wind_change.fit.new)
sum0.new <- summary(intercept.fit.new)$coefficients[,c(1,4)]
sum1.new <- summary(wind_prev.fit.new)$coefficients[,c(1,4)]
sum2.new <- summary(lat_change.fit.new)$coefficients[,c(1,4)]
sum3.new <- summary(long_change.fit.new)$coefficients[,c(1,4)]
sum4.new <- summary(wind_change.fit.new)$coefficients[,c(1,4)]
kable(cbind(sum0.new, sum1.new, sum2.new, sum3.new, sum4.new), format = "latex", digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
add_header_above(c(" " = 1, "$\\beta_0$" = 2, "$\\beta_1$" = 2, "$\\beta_2$" = 2, "$\\beta_3$" = 2, "$\\beta_4$" = 2))
year_sum = as.data.frame(rbind(sum0.new, sum1.new, sum2.new, sum3.new, sum4.new))
year_sum = year_sum %>% mutate(
coefficient = gsub("\\.", "", gsub(".\\d","",gsub("X.","",rownames(year_sum)))),
response = c(rep("$\\beta_0$",2), rep("$\\beta_1$",2), rep("$\\beta_2$",2), rep("$\\beta_3$",2), rep("$\\beta_4$",2))
) %>% dplyr::select(c("response","coefficient",everything()))
row.names(year_sum) = NULL
save(year_sum, file = "./tables/year_sum.RData")
```
Now fit linear models for $\beta$ against the season variables (corresponding to the year) to seek for potential evidence of the statement :"the wind speed has been increasing over years". In order to analyze this question, need to inspect on model which corresponds to the wind speed and the year. As we can see from the table, though for some $\beta$ model, coefficient for year is significant, the estimates are all really close to zero. Therefore, we can infer that as the year increases, the wind speed may decrease a little, which is contrary to the statement. However, it's quite match with the results shown in the figures in the initial EDA session, which indicates the mean wind speed tends to decrease over years.
In conclusion, for different months, there is no significant differences observed. Over years, the effect the wind speed 6 months ago has on the current wind speed may decrease a little.