webcrawler using a tor-proxy, elasticsearch and scrapy
- docker
- docker-compose
- internet connection :\
You can set the entrypoint for the crawler in the docker-compose.yml under scrapy / urls. Many urls are comma separated.
docker-compose up -d
The crawler starts its work automatically.
docker-compose stop
docker-compose start
docker-compose down
To find something in your index, you can use kibana. Open the address in your browser:
http://localhost:5601
As index pattern set "crawler". Timefield name is "timestamp".
Then click on "discover" on the left side to see all found pages. To search for an entry, type the keywords into the kibana-search-bar on the top. To get an kibana introduction, you can go to this site: https://www.elastic.co/guide/en/kibana/current/introduction.html