You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently it's normal for the actor to be restarted by the Apify platform because of an imminent migration to another host. The Scrapy integration doesn't handle this case. When an actor made in Scrapy gets interrupted, it restarts from the beginning. This drains resources, puts more load on the target websites, and results in timeouts, effectively ruining that particular actor run.
...it looks like that the official Scrapy - Apify integration just allow you to run the scrapy project on the platform but nothing more, so no state persistence. In that case you need to take care of that on your own
The Scrapy integration just uses the cloud storage when you run it on Apify, and that is persistent by design.
I file this issue to figure out how is it and whether you think this is something the integration should take care of.
Because I think it should. As an actor creator using Scrapy, so far I didn't need to know many specifics of the platform. I created a Scrapy project, added the integration, deployed to Apify, and it pretty much worked.
However, if any actor can be interrupted anytime - apparently a completely normal thing for the platform to do, and as a result it results in ruining the scraper run, my reasoning would be this renders the integration incomplete, as it doesn't help enough to make a project which successfully runs on the platform.
The text was updated successfully, but these errors were encountered:
Apparently it's normal for the actor to be restarted by the Apify platform because of an imminent migration to another host. The Scrapy integration doesn't handle this case. When an actor made in Scrapy gets interrupted, it restarts from the beginning. This drains resources, puts more load on the target websites, and results in timeouts, effectively ruining that particular actor run.
The issue has been discussed on Discord with the advice being:
Elsewhere, @janbuchar mentions:
I file this issue to figure out how is it and whether you think this is something the integration should take care of.
Because I think it should. As an actor creator using Scrapy, so far I didn't need to know many specifics of the platform. I created a Scrapy project, added the integration, deployed to Apify, and it pretty much worked.
However, if any actor can be interrupted anytime - apparently a completely normal thing for the platform to do, and as a result it results in ruining the scraper run, my reasoning would be this renders the integration incomplete, as it doesn't help enough to make a project which successfully runs on the platform.
The text was updated successfully, but these errors were encountered: