Add process_spider_output_async() to the spider middleware. #91

wRAR · 2024-12-27T14:24:12Z

Fixes #72.

I wonder how can we test the change on Scrapy Cloud before merging this though?

Gallaecio · 2024-12-27T14:45:27Z

I wonder how can we test the change on Scrapy Cloud before merging this though?

I believe you can install a newer version of the entrypoint through requirements.txt, the only thing that does not update properly is the execution of binaries I think.

wRAR · 2024-12-27T14:46:40Z

sh_scrapy/middlewares.py

@@ -1,5 +1,6 @@
 # -*- coding: utf-8 -*-
 import itertools
+from warnings import warn


This is for the call in HubstorageDownloaderMiddleware.from_crawler(), added in some older change.

wRAR · 2024-12-27T15:32:02Z

sh_scrapy/middlewares.py

@@ -28,11 +29,22 @@ def process_spider_output(self, response, result, spider):
        parent = self._seen_requests.pop(response.request, None)
        for x in result:
            if isinstance(x, Request):


Check not extracted because we tentatively decided that as a general pattern for such middlewares we prefer to separate "process_request" and "process_item" logic.

sh_scrapy/middlewares.py

wRAR · 2025-01-06T07:47:40Z

I confirmed this works (if the project has an async middleware, there is a warning normally and no warning if this branch is installed explicitly).

Add process_spider_output_async() to the spider middleware.

b96b375

wRAR requested review from kmike, Gallaecio and elacuesta December 27, 2024 14:29

Gallaecio approved these changes Dec 27, 2024

View reviewed changes

wRAR commented Dec 27, 2024

View reviewed changes

kmike reviewed Dec 27, 2024

View reviewed changes

sh_scrapy/middlewares.py Outdated Show resolved Hide resolved

kmike approved these changes Dec 27, 2024

View reviewed changes

Gallaecio mentioned this pull request Dec 30, 2024

Clean the logs zytedata/zyte-spider-templates-project#30

Draft

10 tasks

Remove an extra empty line.

15073f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add process_spider_output_async() to the spider middleware. #91

Add process_spider_output_async() to the spider middleware. #91

wRAR commented Dec 27, 2024

Gallaecio commented Dec 27, 2024 •

edited

Loading

wRAR Dec 27, 2024

wRAR Dec 27, 2024

wRAR commented Jan 6, 2025

Add process_spider_output_async() to the spider middleware. #91

Are you sure you want to change the base?

Add process_spider_output_async() to the spider middleware. #91

Conversation

wRAR commented Dec 27, 2024

Gallaecio commented Dec 27, 2024 • edited Loading

wRAR Dec 27, 2024

Choose a reason for hiding this comment

wRAR Dec 27, 2024

Choose a reason for hiding this comment

wRAR commented Jan 6, 2025

Gallaecio commented Dec 27, 2024 •

edited

Loading