Skip to content

This repository contains configuration files for web scraping the Hacker News Website.

Notifications You must be signed in to change notification settings

Cherukuri-Thanu/WebScraping-HN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebScraping-HN

Description

This project is a custom scraper for the Hacker News website. It is designed to extract news articles from multiple pages of Hacker News, filtering and sorting them based on the number of upvotes. The final output includes articles that have garnered more than 99 upvotes, providing a curated list of popular and relevant news items.

Features

  • Scrapes multiple Hacker News pages.
  • Filters articles with more than 99 upvotes.
  • Sort articles based on upvote count.
  • Utilizes BeautifulSoup for efficient HTML parsing.

How to Use

  1. Clone this repository.
  2. Install the required dependencies: requests and beautifulsoup4.
  3. Add URLs of the Hacker News pages you want to scrape in URLs_list.txt.
  4. Run the script: python main.py.

Requirements

  • Python 3.x
  • requests
  • beautifulsoup4

Contact

Thanuja Cherukuri - [[email protected]]

About

This repository contains configuration files for web scraping the Hacker News Website.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages