Principle of Urban Informatics 2017

This is a class homework respository for Baiyue Cao (bc1561)

Description:

This class covers the basics of data-driven urban research. I aquired computational skills, basic knowledge of statistical analysis, error analysis, good practises for handling data and big-data, and communication and visualization skills. I learned how to formulate a question relevant to Urban Science, how to find an appropriate data to answer the question, prepare and analyze the data, get an answer, to whichever confidence level, and communicate my answer, and my confidence level in the answer.

Key Words/techniques:

Research reproducibility: Git, virtual environment, virtual machine, version control, hypothesis formulation
Data ETL: Pandas, Geopandas, SQL, API
Statistical tests: Anderson-Darling test (AD), Kullback–Leibler divergence (KL), Chi-square, Kolmogorov–Smirnov test (KS)
Clustering: PCA, Kmeans, Gaussian Mixture
Time Series: Fourier Transformation
Liner modeling: OLS, WLS, GLS
Key data set:

Content:

Setting up virtual environment and formulating null hypothesis link
Extracting data from MTA API link
Proving central limit theorem with visualization and data exploration with citi-bike data link
Replication study for Effectivness of the NYC Post-Prison Employment Program, formulating null hypothesis and conduct statistical tests. link
Running KS/AD/KL/Chi-square_ tests on sample data, creating OLS and WLS models link
Visualizing NYC LL84 dataset and compared linear model vs polynomial model link
Using CartoDB and SQL queries for data ETL link
Visualization practice with NYC HIV demographics data link
Reviewing visualization, using Geopandas to plot choropleth of broad band access percentage in NYC along with LinkNYC data, using the American Community Survey API and LinkNYC open data. link
(time)-series techniques: smoothing, detrending, stationary, non-stationary, homeo- & hetero-scedastic noise, vectorization. Also conducted user behavior clustering using PCA feature selection and Kmeanslink
Clustering zipcodes in NYC using business activity time series data from the Census Bureau API, conducted data whitening, then Kmeans clustering and Gaussian Mixture link

Note:

Special shout out to Federica Bianco for this amazing class.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.ipynb_checkpoints		.ipynb_checkpoints
HW10_bc1561		HW10_bc1561
HW11_bc1561		HW11_bc1561
HW1_bc1561		HW1_bc1561
HW2_bc1561		HW2_bc1561
HW3_bc1561		HW3_bc1561
HW4_bc1561		HW4_bc1561
HW5_bc1561		HW5_bc1561
HW6_bc1561		HW6_bc1561
HW7_bc1561		HW7_bc1561
HW8_bc1561		HW8_bc1561
HW9_bc1561		HW9_bc1561
Lab11_bc1561		Lab11_bc1561
Lab5_bc1561		Lab5_bc1561
.gitignore		.gitignore
README.md		README.md

SPTKL/Principle_of_Urban_Informatics

Folders and files

Latest commit

History

Repository files navigation

Principle of Urban Informatics 2017

This is a class homework respository for Baiyue Cao (bc1561)

Description:

Key Words/techniques:

Content:

Note:

About

Topics

Resources

Stars

Watchers

Forks

Languages