Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 1.74 KB

README.md

File metadata and controls

54 lines (39 loc) · 1.74 KB

GitHub Stars scraper

Web-scraper for your own GitHub Stars for further analysis or to recall long-time ago forgotten repos.

For now, script just outputs a JSON file output.json with results of scraping, in the same directory with the script.

The script is extremely plain and primitive, and utilizes Selenium web-browser automation library.

Contributions are welcome.

Requirements

  • Chrome Browser
  • chromedriver
  • python 3.9+

Installation

Instructions shown are for Unix based Operating Systems (specifically MacOS)

Download stable chromedriver. Better if version matches to version of the currently installed Chrome Browser.

cd ~/Downloads
curl -OL https://storage.googleapis.com/chrome-for-testing-public/126.0.6478.63/mac-x64/chrome-mac-x64.zip
unzip chrome-mac-x64.zip

# on MacOS you might need to run
cd chromedriver-mac-x64
xattr -d com.apple.quarantine chromedriver

Scraper setup

git clone https://github.com/den-is/gh-stars-scraper.git

cd gh-stars-scraper

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Running scraper

  • Create .env file with correct values, from provided .env.example.
  • Run script python3 main.py
  • Wait script to open browser window and open login page
  • Provide OTP code (I hope you have protected your github account with 2FA authentication)
  • Watch magic to happen
  • Do not interact with the web page

Known-issues

  • For some reason, rarely script is not able to fetch lists some specific repo belongs to.
  • Web pages evolve. New features and versions are constantly released. That means that scraper's logic might and will degrade over time, and will require maintenance.