I worked on the Superstore Sales Dataset, performing (as Part 1) data cleaning and preparation and exploratory data analysis. The main task was to make predictions for future sales based on time-series analysis, which is found in Part 2. I identified overarching trends, found seasonality, and tested for cyclic contributions. I then looked at residuals and used this information, and numerous visuals, to predict an additional week's worth of data.
I was able to identify a slight trend and some seasonality to the data as well. Unforutnatley, once removed, a lot of "noise" was left-- there were no cycles I could identify from using lag features, and the residuals were randomly distributed. This expected, for the most part, with sales data-- we see some seasonality with quarters and holidays, depending on what is being sold, but there can be any number of reasons why there are spikes or lulls.
Programming, Python, Statistics, Numpy, Pandas, Matplotlib, Scikit-learn, Dataframes, Data Modeling, EDA, Data Visualization, Data Reporting, Time-Series Analysis, Seaborn, Supervised ML, StatsModels