Scrapyrt scrape multiple spiders asynchronously at once instead of overwhelming the server with request #130

xaander1 · 2021-06-16T12:52:46Z

@pawelmhm requesting ability to scrape multiple spiders asynchronously at once instead of overwhelming the server with request
Here is what i mean:

{
    "request": {
        "url":["https://www.site1.com","https://www.site2.com","https://www.site3.com"] ,
        "callback": "parse_product",
        "dont_filter": "True"
    },
    "spider_name": ["Site1","Site2","Site3"]
}

Enabling the ability to scrape multiple spiders at once in real-time.

The alternative would be to write an api utilizing requests that programatically sends these requests one by one asynchronously then combine the results which i feel is a little bit unneat and resource intensive...built in support would be nice.

The text was updated successfully, but these errors were encountered:

pawelmhm · 2021-09-22T13:21:10Z

It sounds interesting, I think some sort of batch processing would be good here, in your example it will be difficult to know which spider should crawl which url, but maybe we could support something like this

{ "request": [
    {"url": "http://example1", "spider": "spider1"}, 
   {"url2": "http://example2", "spider": "spider2"}
]

so essential request as a list, but we'd have to think how to do it, changes would have to be made in: CrawlManager and CrawlResources.

xaander1 · 2022-01-20T16:32:26Z

@pawelmhm How long for this to be implemented?

pawelmhm added the enhancement label Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrapyrt scrape multiple spiders asynchronously at once instead of overwhelming the server with request #130

Scrapyrt scrape multiple spiders asynchronously at once instead of overwhelming the server with request #130

xaander1 commented Jun 16, 2021

pawelmhm commented Sep 22, 2021

xaander1 commented Jan 20, 2022 •

edited

Loading

Scrapyrt scrape multiple spiders asynchronously at once instead of overwhelming the server with request #130

Scrapyrt scrape multiple spiders asynchronously at once instead of overwhelming the server with request #130

Comments

xaander1 commented Jun 16, 2021

pawelmhm commented Sep 22, 2021

xaander1 commented Jan 20, 2022 • edited Loading

xaander1 commented Jan 20, 2022 •

edited

Loading