You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It'd be nice to have a custom RetryMiddleware that did per-domain backoff. (Note
that we're sort of abusing the Scrapy architecture; we're supposed to have one
"spider" class per domain, but instead we only have one.)
One way to do this is to provide a custom subclass of
scrapy.contrib.downloadermiddleware.retry.RetryMiddleware and then override the
_retry method.
That should let us more reliably crawl some of the sites that are quite finnicky.
Status: unread
Nosy List: paulproteus
Priority: wish
Imported from roundup ID: 793 (view archived page)
Last modified: 2012-11-20.16:04:43
The text was updated successfully, but these errors were encountered:
Comment by paulproteus:
2 bugs per second report HTTP 504 Gateway Timeout.
The way Scrapy handles this now is in the
http://doc.scrapy.org/en/0.12/topics/downloader-middleware.html#module-
scrapy.contrib.downloadermiddleware.retry middleware, which re-queues the job but
doesn't insist on a time delay.
It'd be nice to have a custom RetryMiddleware that did per-domain backoff. (Note
that we're sort of abusing the Scrapy architecture; we're supposed to have one
"spider" class per domain, but instead we only have one.)
One way to do this is to provide a custom subclass of
scrapy.contrib.downloadermiddleware.retry.RetryMiddleware and then override the
_retry method.
That should let us more reliably crawl some of the sites that are quite finnicky.
Status: unread
Nosy List: paulproteus
Priority: wish
Imported from roundup ID: 793 (view archived page)
Last modified: 2012-11-20.16:04:43
The text was updated successfully, but these errors were encountered: