A collection of helpers for running scrapers built with Scrapy in ScraperWiki
Example:
from scrapy.conf import settings
from scrapyrwiki import run_spider
def main():
run_spider(MySpider(), settings)
if __name__ == '__main__':
main()
Just add "scrapyrwiki.pipelines.ScraperWikiPipeline" to ITEM_PIPELINES
Example:
from scrapy.conf import settings
from scrapyrwiki import run_spider
def scraperwiki():
options = {
'SW_SAVE_BUFFER': 5,
'SW_UNIQUE_KEYS': {"MyItem": ['url']},
'ITEM_PIPELINES': ['scrapyrwiki.pipelines.ScraperWikiPipeline'],
}
settings.overrides.update(options)
run_spider(MySpider(), settings)
if __name__ == 'scraper':
scraperwiki()
Just launch spider with run_tests
Example:
from scrapyrwiki import run_tests
from scrapy.conf import settings
run_tests(MySpider(), "output.xml", settings)
Note: For testing the HTTP cache is used. In the directory where the script is launched there must be a scrapy.cfg (needed by Scrapy to identify that's a scraper directory) and a .scrapy directory with the HTTP cache db.
The output is in XUnit format, tested on Jenkins
Install scrapy-sentry and set the environment variable SENTRY_DSN with the Sentry key. Scrapyrwiki will handle everything for you.