Basic Api Integration with Web Scrapping using Tastypie and Scrapy

External Dependency

Virtualenv (This is default instaled with Ubuntu)

Installation

1 - Clone the project to your folder and enter on it

$ git clone

2 - Create and activate virtualenv to host your packages and external dependencies

$ virtualenv vm-scrapy-api
$ source vm-scrapy-api/bin/activate

3 - Install the requirements (Will install django and other internal dependencies)

Obs.: Enter in project folder

$ cd scrapy-api 
$ pip install -r requirements.txt

4 - Run the migrations and the server

$ fab migrations
$ fab runserver

5 - Access the API

Endpoints: In your brower, curl or postman (nice one to make testes) GET (Unimplemented yet) Will show you all spiders you sent to crawl http://localhost:8000/requisicao/api/v1/requisicao/

POST Send the spider to specific url to count how many words this page have and returns a json with the name you choose or a default one, that shows you the count, urls you have accessed, the name of the file and the date http://localhost:8000/requisicao/api/v1/requisicao/crawler/

{
	"key": "diego",
	"url": "https://github.com/Diegow3b",
	"json": "word_count.json"
}

Parameters:

key: The word you want count
url: The page url you want to crawl
json: Optional - The json file you want to write, if you not choose one this will be created as 'crawler_palavras.json' in the root folder Obs.: Since this crawl the source code and not the "view page" you must count the source code instead how many times you seeing the name in reagular page

Extra

Sedding Spiders in command line

$ scrapy crawl letras_spider -o nameofjsonfile.json -t json -a url=www.urltocrawl.com -a key=wordyouwantcount -a json=nameofjsonfile.json

Further help

To get more help ask me my github or email.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
crawler		crawler
project		project
requisicao		requisicao
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fabfile.py		fabfile.py
manage.py		manage.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Api Integration with Web Scrapping using Tastypie and Scrapy

External Dependency

Installation

1 - Clone the project to your folder and enter on it

2 - Create and activate virtualenv to host your packages and external dependencies

3 - Install the requirements (Will install django and other internal dependencies)

4 - Run the migrations and the server

5 - Access the API

Extra

Further help

About

Releases

Packages

Languages

License

Diegow3b/tastypie-api-scrapy

Folders and files

Latest commit

History

Repository files navigation

Basic Api Integration with Web Scrapping using Tastypie and Scrapy

External Dependency

Installation

1 - Clone the project to your folder and enter on it

2 - Create and activate virtualenv to host your packages and external dependencies

3 - Install the requirements (Will install django and other internal dependencies)

4 - Run the migrations and the server

5 - Access the API

Extra

Further help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages