Saving scraped items in a feed #147

runa · 2023-06-02T14:00:25Z

Hi! thanks for your work on Scrapyrt!

I've discovered that spiders served by Scrapyrt don't save the output in the Spider's / custom_settings / FEEDS. Is it possible to change this behavior and make the spider served by Scrapyrt respect this setting?

Thanks!

pawelmhm · 2024-02-23T07:40:45Z

@runa can you add some sample code to reproduce this and add more details? I tested with this simple spider

import scrapy


class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]
    custom_settings = {
        'FEEDS': {
            'items.json': {
                'format': 'json'
            }
        }
    }

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text::text").extract_first(),
                'author': quote.css("small.author::text").extract_first(),
                'tags': quote.css("div.tags > a.tag::text").extract()
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            yield scrapy.Request(response.urljoin(next_page_url))

and when scheduled with ScrapyRT

curl --location 'http://localhost:9080/crawl.json' \
--header 'Content-Type: application/json' \
--data '{
    "request": {
        "url": "https://quotes.toscrape.com/"
    },
    "spider_name": "toscrape-css"
}'

I see there is items.json file generated in filesystem of spider project. Is there some specific feed that is failing for you?

pawelmhm added the more info needed original poster should provide more details to allow us to identify the problem label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving scraped items in a feed #147

Saving scraped items in a feed #147

runa commented Jun 2, 2023

pawelmhm commented Feb 23, 2024 •

edited

Loading

Saving scraped items in a feed #147

Saving scraped items in a feed #147

Comments

runa commented Jun 2, 2023

pawelmhm commented Feb 23, 2024 • edited Loading

pawelmhm commented Feb 23, 2024 •

edited

Loading