-
Notifications
You must be signed in to change notification settings - Fork 0
/
EDA.Rmd
107 lines (87 loc) · 4.22 KB
/
EDA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "EDA"
author: "Renjie Wei, Xinran Sun"
date: '2022-05-03'
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(ggplot2)
dt= read.csv("./hurrican703.csv")
ggplot(data=dt, aes(x = Longitude, y = Latitude)) +
stat_summary_2d(data = dt, aes(x = Longitude, y = Latitude, z = dt$Wind.kt), fun = median, binwidth = c(1, 1), show.legend = TRUE)
# + ggtitle("Distribution of Hurricanes at Different Location")
library(data.table)
dt <- as.data.table(dt)
summary(dt)
```
In the hurrican703 dataset, we have 22038 observations of 702 hurricanes range from year 1950 to 2013. The latitude ranges from 5 to 70.7, while longitude ranges from -107.7 to 13.5. The lowest observed wind speed is 10 Knot and the maximum wind speed is 165 knot.
```{r}
library(tidyverse)
dt_long <- dt %>%
dplyr::group_by(ID) %>%
mutate(Wind_prev = lag(Wind.kt, 1),
Lat_change = Latitude - lag(Latitude, 1),
Long_change = Longitude - lag(Longitude, 1),
Wind_prev_prev = lag(Wind.kt, 2)) %>%
mutate(Wind_change = Wind_prev - Wind_prev_prev)
save(dt_long, file = "dt_long.RData")
```
From the original dataset, we built a new dataset with contains five more variables, including wind speed at 6 hours ago, wind speed at 12 hours ago, latitude change compared to 6 hours earlier, longitude change compared to 6 hours earlier, and wind speed change at 6 hours earlier compared to 12 hours earlier. These variables will help us to build the model in the following part.
```{r}
# Month and count
storms_month_name = distinct(group_by(select(dt_long, Month, ID), Month))
storms_month_name %>%
dplyr::group_by(Month) %>%
mutate(Month = factor(Month, levels = month.name)) %>%
ggplot(aes(x = Month)) +
geom_bar()
# + ggtitle("Count of Hurricanes in Each Month")
dt_long %>%
dplyr::group_by(Month) %>%
mutate(avg_speed = mean(Wind.kt)) %>%
distinct(Month, avg_speed) %>%
mutate(Month = factor(Month, levels = month.name))%>%
ggplot(aes(x = Month, y = avg_speed)) +
geom_point() +
scale_y_continuous("Average Speed (knot)")
# + ggtitle("Average Speed (knot) of Hurricanes in Each Month")
```
We use a bar plot to examine the number of hurricanes in each month. From this plot, we can see that September is the month with the most hurricanes, while there are no hurricanes in February and March. Hurricanes in September also have the highest average wind speed as we can see in the average speed plot.
```{r}
# Season and count
storms_season_name = distinct(group_by(select(dt_long, Season, ID), Season))
ggplot(data = storms_season_name) +
geom_bar(aes(x = Season)) +
scale_x_continuous("Year")
# + ggtitle("Count of Hurricanes in Each Year")
dt_long %>%
dplyr::group_by(Season) %>%
mutate(avg_speed = mean(Wind.kt)) %>%
distinct(Season, avg_speed) %>%
ggplot(aes(x = Season, y = avg_speed)) +
geom_point() +
geom_smooth(method = "loess") +
scale_y_continuous("Average Speed (knot)") +
scale_x_continuous("Year")
# + ggtitle("Average Speed (knot) of Hurricanes in Each Year")
```
If we group the hurricanes by years, we can see in general, we have more observations in recently years compared to 50 years ago. However, the average wind speed seems to have a decreasing trend.
```{r}
# Nature and count
storms_nature_name = distinct(group_by(select(dt_long, Nature, ID), Nature))
ggplot(data = storms_nature_name) +
geom_bar(aes(x = Nature))
# + ggtitle("Count of Hurricanes in Each Nature")
dt_long %>%
dplyr::group_by(Nature) %>%
mutate(avg_speed = mean(Wind.kt)) %>%
distinct(Nature, avg_speed) %>%
ggplot(aes(x = Nature, y = avg_speed)) +
geom_point() +
scale_y_continuous("Average Speed (knot)")
# + ggtitle("Average Speed (knot) of Hurricanes in Each Nature")
```
We also compare the hurricanes with different natures. In our dataset, there are 1214 different nature ratings. This number is larger than the number of hurricanes because some hurricanes are in different natures at different time. From this plot, we know that more than half of the natures are in Tropical Storm category. This nature also have the highest average wind speed at about 60 knot, while the disturbance and not rated hurricanes have average wind speed at round 20 knot.