Skip to content

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

License

Notifications You must be signed in to change notification settings

rootVIII/proxy_web_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u <url> -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

  • A list of proxies from the web are scraped first using sslproxies.org
  • Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
  • The website is then visited, and one random link is clicked within the website
  • The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
  • Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

  • Requirements:
    • python3
    • selenium
    • Firefox browser
    • geckodriver
  • Download the latest geckodriver from Mozilla
  • Unzip the file and place geckodriver into your path
  • Ensure selenium is installed: pip install -r requirements.txt

screenshot1



screenshot2



screenshot3


Author: rootVIII 2018-2023

Releases

No releases published

Packages

No packages published

Languages