forked from gillianriches/CellGPSDataToTrips
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-method.Rmd
78 lines (54 loc) · 3.69 KB
/
03-method.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
output:
pdf_document: default
html_document: default
---
# Methods
```{r setup2, include = FALSE}
# load R libraries here; the `include` flag in the chunk options above tells
# whether to print the results or not. Usually you don't want to print the
# library statements, or any code on the pdf.
# Main Packages ========
# I use these in every doc
library(tidyverse)
library(knitr)
library(kableExtra)
library(modelsummary)
options(dplyr.summarise.inform = FALSE)
# Other packages ------
# These sometimes get used, and sometimes don't.
library(mlogit)
# Instructions and options =========
# prints missing data in tables as blank space
options(knitr.kable.NA = '')
# tells kableExtra to not load latex table packages in the chunk output
options(kableExtra.latex.load_packages = FALSE)
options(modelsummary_format_numeric_latex = "plain")
# round and format numbers that get printed in the text of the article.
inline_hook <- function(x) {
if (is.numeric(x)) {
format(x, digits = 3, big.mark = ",")
} else x
}
knitr::knit_hooks$set(inline = inline_hook)
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
# options for latex-only output
if(knitr::is_latex_output()) {
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
}
```
## Data
The GPS data used to determine the most accurate delta_t parameter came from one volunteer in
the Utah County area and was taken over a period of six months. For the purposes of this question, only 10 days worth of GPS data will be processed.
## Models
Ultimately, to choose the most accurate delta_t parameter, the number of clusters calculated by the algorithm (algorithm clusters) was compared to the number of clusters it appeared there should be for that day (manual clusters). The method for how those clusters were created are described below.
### Algorithm Clusters
Once the data was cleaned and properly formatted, it was run through a DBSCAN-entropy hybrid algorithm largely based on the method created by Gong et al. in 2018 [@GongInspiration]. The concepts from Gong's DBSCAN-Entropy algorithm were taken and written in R using the *dbscan* package [@dbscanR] and a new *gpsactivs* package written by Dr. Gregory Macfarlane that has yet to be published to CRAN for public use.
For this algorithm, the four parameters of minpts, eps, entr_t, delta_t are required. Based on the current literature, the three constant parameters were set as follows:
* mintpts = 3
* eps = 25 meters
* entr_t = 1.0
To compare the accuracies of different delta_t parameters, 20 draws were done for each of the 10 days. Each draw kept the same constant parameters as listed above, and the delta_t parameter was randomly selected from a range of 1 to 400 seconds. By the end of running this algorithm, each of the 10 days had 20 outputs for the number of clusters as determined by the randomly selected delta_t parameter.
### Manual Clusters
To get the number of "manual clusters" per day, maps of the raw GPS data were created using the *sf* package in R [@sfR]. Those maps were then referenced as the researchers made their own GeoJSON files that stored the geometric locations of where potential trips looked like they were happening. One GeoJSON file was created for each day.
Finally those GeoJSON files were read into R and appended onto the table including the algorithm's calculated number of clusters. For each of the 10 days, the 20 algorithm possibilities for number of clusters was compared to the number of manual clusters picked in the GeoJSON files. From this, the percent error was calculated and the delta_t parameter that consistently gave the lowest error across all 10 days was deemed to be the most accurate delta_t parameter to use in this DBSCAN-Entropy Algorithm.