This repository includes all the Python code I used to gather data from a novel translation archive website. In here, you will find 5 Python files, which consists of 4 different classes and 1 main script.
- logger
This file contains aLogger
class which is used to keep track of warnings and errors that may or may not happen during runtime and then write it into a file calledlogs.log
. - pool
This file has a function calledcreate_pool()
which will be called byProxer
to generate a set (a pool) of random proxies and headers. This pool will later be passed on as a parameter during the HTTP GET request so the bot will seem more like a normal user and will less likely to get blocked. - proxer
TheProxer
class in this file keeps track of the IP and header rotations onmain
, does the HTTP request, and then return the HTML response. - novelparser
Includes theBeautifulSoup
codes to find the desired information, including maximum number of pages, titles for each page, and details for each novel. - main
Contains a loop to fetch get the HTML response usingProxer(). open_site()
and parse it using various functions fromNovelParser
.