Skip to content

Service for retrieving all urls from a domain, separated into pdf and other content

Notifications You must be signed in to change notification settings

FriedrichSal/pdf-crawl

Repository files navigation

PDF Crawl

Web service for retrieving list of urls form web domain. The urls are classified into PDF and other.

How to use

Install dependencies with pipenv install, then start the service with

pipenv run python app.py -p 8080

Then you can query from localhost like so

curl localhost:8080/crawl -d "url=https://www.centralpark-hamburg.de" -X POST

By default, the search is two layers deep. You can go one level deeper by querying

curl localhost:8080/crawl -d "url=https://www.centralpark-hamburg.de&layers=3" -X POST

curl https://pdf-crawl-5ekifxtyca-ew.a.run.app/crawl -d "url=https://www.centralpark-hamburg.de" -X POST

About

Service for retrieving all urls from a domain, separated into pdf and other content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published