My goal for this project was to gain experience using API's to extract data and analyze the sentiment values of video comments; I took big inspiration from Nate (Stratascrath) to start this project.
The first step I took was to read through the YouTube API documentation to understand what kind of data I could gather. Looking through the information sparked the idea to create a project where I would gather comments from a video and figure out the sentiment behind it. Then, I could give each video a score based on positive, negative or neutral sentiment. After having an idea of what I wanted for this project to look like, it was time to begin coding and learn.
The information I decided to gather was channel name, video title, upload date, video ID (needed to extract comments), views, number of likes and comments. In order to extract the data, the channel ID can be typed in the third cell; for example "https://www.youtube.com/channel/UCrPseYLGpNygVi34QpGNqpA". Since the append function is being phased out in future versions of Python, I needed to alter the code that Nate had presented in the video to use the concat function.
Executing the 6th cell will trigger running the empty Data Frame through the functions to collect the data. I decided to create a variable that uses the channel name when converting the Data Frame to a CSV file, to make it easier to locate and keep track of.
The second Jupyter notebook contains the code to extract the sentiment values of a YouTube video. The first step is to import the required libraries and in this step we will need the Natural Language Toolkit to gather the sentiment values. Then, the CSV file needs to be imported to use the video ID in the API call URL. The loop will iterate over each row grabbing the video ID and collecting the video's comments.
The sentiment analyzer from the NLTK library will be used to collect the positive, neutral, and negative sentiment values. I then later decided to also collect the compound value which is calculated by adding all the values and returning a value ranging from -1 to 1. The code will loop over the API pages grabbing each comment, as now the code doesn't seem to collect all comments which I look to troubleshoot in the future. Finally, the sentiment values will be grouped by video ID and rounded to two decimal points; returning a data frame with five columns. The last function takes in the sentiment data frame and joins it with the imported data frame to complete the process.
This project was fun to take on and I definitely learned new skills. My future goal is to find another API where I can collect data and do similar in-depth analysis, something like Zillow home prices or car prices. If you happen to stumble upon this project or use it yourself, I would appreciate any suggestions you may have!
- Diego 🍩