Skip to content

Latest commit

 

History

History
37 lines (19 loc) · 975 Bytes

README.md

File metadata and controls

37 lines (19 loc) · 975 Bytes

content-scraper-py

Scrape List Of Posts Page And Scrape The Content Of It

This tool including to get list of post urls in archive page or similiar page.

And you will be able to get raw content (text of the content, including the title), or html version (only in content area and you will get title string) and translated content in indonesia.

Setup

Install Python in your pc if you haven't. Get it here

Https://www.python.org/downloads/

Install the requirement library here

pip install -r requirements.txt

Download the chromedriver and paste it in driver/ folder. Get it here

http://chromedriver.chromium.org/downloads

How to use It

Open the example file. There are two of them.

First one to get list of urls to scrape. And the second one, scrape all the content of scraped url list

Where is the result

you can see the result in results/ folder

Update Plan

I will update this tool to implan your backlink inside the content.