Skip to content

Commit

Permalink
Update documentation of Scrapy Actor template (#261)
Browse files Browse the repository at this point in the history
  • Loading branch information
vdusek authored Jan 10, 2024
1 parent 09bdd99 commit b0d2a99
Show file tree
Hide file tree
Showing 9 changed files with 15 additions and 45 deletions.
2 changes: 1 addition & 1 deletion templates/python-beautifulsoup/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify ~= 1.5.0
apify ~= 1.5.1
beautifulsoup4 ~= 4.12.2
httpx ~= 0.25.2
types-beautifulsoup4 ~= 4.12.0.7
2 changes: 1 addition & 1 deletion templates/python-empty/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify ~= 1.5.0
apify ~= 1.5.1
2 changes: 1 addition & 1 deletion templates/python-playwright/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify ~= 1.5.0
apify ~= 1.5.1
playwright ~= 1.39.0
2 changes: 1 addition & 1 deletion templates/python-scrapy/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify[scrapy] ~= 1.5.0
apify[scrapy] ~= 1.5.1
nest-asyncio ~= 1.5.8
scrapy ~= 2.11.0
23 changes: 4 additions & 19 deletions templates/python-scrapy/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,11 @@
modifications. For instance, removing Apify-Scrapy components from the settings will break the integration
between Scrapy and Apify.
Known limitations to be aware of:
---------------------------------
1. Asynchronous spiders and Twisted & AsyncIO integration
Asynchronous spiders (and possibly other components) may encounter challenges due to the Twisted & AsyncIO
integration. If you need to execute a coroutine within the Spider, it's recommended to use Apify's custom
nested event loop. See the code example below or find inspiration from Apify's Scrapy components, such as
[ApifyScheduler](https://github.com/apify/apify-sdk-python/blob/v1.3.0/src/apify/scrapy/scheduler.py#L109).
```
from apify.scrapy.utils import nested_event_loop
nested_event_loop.run_until_complete(my_coroutine())
```
2. Single spider limitation
Documentation:
--------------
The current implementation supports the execution of only one Spider per project.
Issue: https://github.com/apify/actor-templates/issues/202
For an in-depth description of the Apify-Scrapy integration process, our Scrapy components, known limitations and
other stuff, please refer to the following documentation page: https://docs.apify.com/cli/docs/integrating-scrapy.
"""

from __future__ import annotations
Expand Down
2 changes: 1 addition & 1 deletion templates/python-selenium/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify ~= 1.5.0
apify ~= 1.5.1
selenium ~= 4.14.0
2 changes: 1 addition & 1 deletion templates/python-start/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Feel free to add your Python dependencies below. For formatting guidelines, see:
# https://pip.pypa.io/en/latest/reference/requirements-file-format/

apify ~= 1.5.0
apify ~= 1.5.1
beautifulsoup4 ~= 4.12.2
httpx ~= 0.25.2
types-beautifulsoup4 ~= 4.12.0.7
2 changes: 1 addition & 1 deletion wrappers/python-scrapy/requirements_apify.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Add your dependencies here.
# See https://pip.pypa.io/en/latest/reference/requirements-file-format/
# for how to format them
apify[scrapy] ~= 1.5.0
apify[scrapy] ~= 1.5.1
nest-asyncio ~= 1.5.8
scrapy ~= 2.11.0
23 changes: 4 additions & 19 deletions wrappers/python-scrapy/{projectFolder}/main.template.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,11 @@
modifications. For instance, removing Apify-Scrapy components from the settings will break the integration
between Scrapy and Apify.
Known limitations to be aware of:
---------------------------------
1. Asynchronous spiders and Twisted & AsyncIO integration
Asynchronous spiders (and possibly other components) may encounter challenges due to the Twisted & AsyncIO
integration. If you need to execute a coroutine within the Spider, it's recommended to use Apify's custom
nested event loop. See the code example below or find inspiration from Apify's Scrapy components, such as
[ApifyScheduler](https://github.com/apify/apify-sdk-python/blob/v1.3.0/src/apify/scrapy/scheduler.py#L109).
```
from apify.scrapy.utils import nested_event_loop
nested_event_loop.run_until_complete(my_coroutine())
```
2. Single spider limitation
Documentation:
--------------
The current implementation supports the execution of only one Spider per project.
Issue: https://github.com/apify/actor-templates/issues/202
For an in-depth description of the Apify-Scrapy integration process, our Scrapy components, known limitations and
other stuff, please refer to the following documentation page: https://docs.apify.com/cli/docs/integrating-scrapy.
"""

from __future__ import annotations
Expand Down

0 comments on commit b0d2a99

Please sign in to comment.