-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the usage of Configuration class #670
Comments
Thanks for bringing this up! If you could write out the attributes in question here, we can discuss the appropriate steps together. |
The fields that are used:
The fields that are not used:
The |
In JS, this is used by
In JS, this is used by
In JS, this is used by
This may be set in https://github.com/apify/apify-actor-docker/blob/master/node-playwright/Dockerfile or by the actor developer in environment variables, but I don't think it actually has any effect 🤔 @vladfrangu do you know more?
These should probably be consumed by
These should probably be handled by
I must have missed that one. The only use case I can think of is when you want to debug something locally against multiple inputs and don't want to bother with renaming files... which is valid |
Agreed, thanks @janbuchar for the investigation. We should also note, that E.g. config = Configuration(headless=False)
crawler = PlaywrightCrawler(config=config, headless=True) then Also, we should split this issue into many smaller ones. |
### Description - The `service_container` module has been completely refactored. It introduces changes to its usage, and resulting in many changes across the code base. - While it remains possible to pass "services" directly to components as before, components now rely on the `service_container` internally. They no longer store service instances themselves. - A new `force` flag has been added to the `service_container`'s setters, which is especially useful for testing purposes. - This also quite simplifies the `memory_storage_client`. - We now have only `set_storage_client`, the same approach as for the `event_manager`. This is more flexible (allows more envs than just local & cloud). And in the SDK Actor, we can them based on the `is_at_home`. - This is a breaking change, but it affects only the `service_container` interface. ### Open discussion - [x] Should we go further and remove the option to pass configuration, event managers, or storage clients directly to components - requiring them to be set exclusively via the `service_container`? Thoughts are welcome. - No - [x] Better name for `service_container`? `service_locator`? - `service_locator` ### Issues - Closes: #699 - Closes: #539 - Closes: #369 - It also unlocks the: - #670, - and maybe apify/apify-sdk-python#324 (comment). ### Testing - Existing tests, including those covering the `service_container`, have been updated to reflect these changes. - New tests covering the `MemoryStorageClient` respects the `Configuration`. ### Manual reproduction - This code snippet demonstrates that the `Crawler` and `MemoryStorageClient` respects the custom `Configuration`. - Note: Some fields remain non-working for now. These will be addressed in a future PR, as this refactor is already quite big. However, with the new architecture, those updates should now be easy. ```python import asyncio from crawlee.configuration import Configuration from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext from crawlee.service_container import set_configuration async def main() -> None: config = Configuration(persist_storage=False, write_metadata=False) set_configuration(config) # or Crawler(config=config) crawler = HttpCrawler() @crawler.router.default_handler async def request_handler(context: HttpCrawlingContext) -> None: context.log.info(f'Processing {context.request.url} ...') await context.push_data({'url': context.request.url}) await crawler.run(['https://crawlee.dev/']) if __name__ == '__main__': asyncio.run(main()) ``` ### Checklist - [x] CI passed
Our
Configuration
class currently contains many fields that are unused and/or undocumented. Review each field, decide whether it should be used, removed, or consolidated, and add missing doc strings.The text was updated successfully, but these errors were encountered: