PDF Crawl

Web service for retrieving list of urls form web domain. The urls are classified into PDF and other.

How to use

Install dependencies with pipenv install, then start the service with

pipenv run python app.py -p 8080

Then you can query from localhost like so

curl localhost:8080/crawl -d "url=https://www.centralpark-hamburg.de" -X POST

By default, the search is two layers deep. You can go one level deeper by querying

curl localhost:8080/crawl -d "url=https://www.centralpark-hamburg.de&layers=3" -X POST

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
app.py		app.py
file_crawl.py		file_crawl.py
requirements.txt		requirements.txt
tasks.py		tasks.py