Virtualenv (This is default instaled with Ubuntu)
$ git clone
$ virtualenv vm-scrapy-api
$ source vm-scrapy-api/bin/activate
Obs.: Enter in project folder
$ cd scrapy-api
$ pip install -r requirements.txt
$ fab migrations
$ fab runserver
Endpoints: In your brower, curl or postman (nice one to make testes) GET (Unimplemented yet) Will show you all spiders you sent to crawl http://localhost:8000/requisicao/api/v1/requisicao/
POST Send the spider to specific url to count how many words this page have and returns a json with the name you choose or a default one, that shows you the count, urls you have accessed, the name of the file and the date http://localhost:8000/requisicao/api/v1/requisicao/crawler/
{
"key": "diego",
"url": "https://github.com/Diegow3b",
"json": "word_count.json"
}
Parameters:
- key: The word you want count
- url: The page url you want to crawl
- json: Optional - The json file you want to write, if you not choose one this will be created as 'crawler_palavras.json' in the root folder Obs.: Since this crawl the source code and not the "view page" you must count the source code instead how many times you seeing the name in reagular page
Sedding Spiders in command line
$ scrapy crawl letras_spider -o nameofjsonfile.json -t json -a url=www.urltocrawl.com -a key=wordyouwantcount -a json=nameofjsonfile.json
To get more help ask me my github or email.