Skip to content

This project utilises the features of the YouTube Data API to retrieve data from YouTube channels, playlists, videos, and comments and store it in a data lake. It also interacts with a PostgreSQL database to store the retrieved data.

Notifications You must be signed in to change notification settings

RajaSoundari/Youtube-Data-Harvesting-And-Warehousing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Youtube-Data-Harvesting-And-Warehousing

YouTube Data Harvesting and Warehousing is a project that intends to provide users with the ability to access and analyse data from numerous YouTube channels. SQL, MongoDB, and Streamlit are used in the project to develop a user-friendly application that allows users to retrieve, save, and query YouTube channel and video data.

TOOLS AND LIBRARIES USED:

this project requires the following components:

STREAMLIT:

Streamlit library was used to create a user-friendly UI that enables users to interact with the programme and carry out data retrieval and analysis operations.

PYTHON:

Python is a powerful programming language renowned for being easy to learn and understand. Python is the primary language employed in this project for the development of the complete application, including data retrieval, processing, analysis, and visualisation.

GOOGLE API CLIENT:

The googleapiclient library in Python facilitates the communication with different Google APIs. Its primary purpose in this project is to interact with YouTube's Data API v3, allowing the retrieval of essential information like channel details, video specifics, and comments. By utilizing googleapiclient, developers can easily access and manipulate YouTube's extensive data resources through code.

MONGODB ATLAS:

MongoDB Atlas is a comprehensive cloud-based database service designed specifically for MongoDB. In this project, MongoDB Atlas is utilized to store the data obtained from YouTube's Data API v3. By leveraging MongoDB Atlas, developers can benefit from a fully managed and hassle-free database solution that ensures the reliable and scalable storage and retrieval of data, thereby facilitating efficient data management.

POSTGRESQL:

PostgreSQL is an open-source, advanced, and highly scalable database management system (DBMS) known for its reliability and extensive features. It provides a platform for storing and managing structured data, offering support for various data types and advanced SQL capabilities.

YOUTUBE DATA SCRAPPING AND ITS ETHICAL PERSPECTIVE:

When engaging in the scraping of YouTube content, it is crucial to approach it ethically and responsibly. Respecting YouTube's terms and conditions, obtaining appropriate authorization, and adhering to data protection regulations are fundamental considerations. The collected data must be handled responsibly, ensuring privacy, confidentiality, and preventing any form of misuse or misrepresentation. Furthermore, it is important to take into account the potential impact on the platform and its community, striving for a fair and sustainable scraping process. By following these ethical guidelines, we can uphold integrity while extracting valuable insights from YouTube data.

REQUIRED LIBRARIES:

1.googleapiclient.discovery

2.streamlit

3.psycopg2

4.pymongo

5.pandas

FEATURES:

The following functions are available in the YouTube Data Harvesting and Warehousing application:

  1. Retrieval of channel and video data from YouTube using the YouTube API.

  2. Storage of data in a MongoDB database as a data lake.

  3. Migration of data from the data lake to a SQL database for efficient querying and analysis.

  4. Search and retrieval of data from the SQL database using different search options.

    you can view a video of this work on my linkedIn: https://www.linkedin.com/posts/raja-soundari-640152281_data-project-content-activity-7082046816392728576-TXv-?utm_source=share&utm_medium=member_desktop

About

This project utilises the features of the YouTube Data API to retrieve data from YouTube channels, playlists, videos, and comments and store it in a data lake. It also interacts with a PostgreSQL database to store the retrieved data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages