Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_run_state method to BasicCrawler #925

Open
Pijukatel opened this issue Jan 21, 2025 · 4 comments
Open

Add get_run_state method to BasicCrawler #925

Pijukatel opened this issue Jan 21, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Pijukatel
Copy link
Contributor

Pijukatel commented Jan 21, 2025

BasicCrawler has several internal state flags and also it's internal component _autoscaled_pool has specific states. Add public method that will analyze internal state flags and state of _autoscaled_pool and return public assessment of the crawler state.

Discussed here:
#921 (comment)
(Update the test discussed there using the newly added get_run_state method.)

@Pijukatel Pijukatel added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Jan 21, 2025
@Pijukatel Pijukatel self-assigned this Jan 21, 2025
@B4nan
Copy link
Member

B4nan commented Jan 21, 2025

The name feels a bit too similar to crawler.useState(), which I can imagine we are also missing in python?

https://github.com/apify/crawlee/blob/02a598c2a501957f04ca3a2362bcee289ef861c0/packages/basic-crawler/src/internals/basic-crawler.ts#L989

@Pijukatel Pijukatel changed the title Add getState method to BasicCrawler Add get_state method to BasicCrawler Jan 21, 2025
@Pijukatel
Copy link
Contributor Author

The name feels a bit too similar to crawler.useState(), which I can imagine we are also missing in python?

https://github.com/apify/crawlee/blob/02a598c2a501957f04ca3a2362bcee289ef861c0/packages/basic-crawler/src/internals/basic-crawler.ts#L989

I admit that crawler.useState and crawler.getState could make users think that it is related functionality. Sadly useState is somewhat strange name and I would be in favor of making the name more explicit to describe what it is actually doing, something along the line getGlobalState, or getSharedState or getSharedValues or getAutosavedAtate or useSharedValues ...

crawler.getState -> returning for example "Running" seems pretty obvious on the other hand. So I am not much in favor of choosing worse name to not collide with useState

Maybe crawler.getStatus or crawler.getCrawlingStatus would be explicit enough and not colliding with useState?

I see that in JS there is some code related to StatusMessage, but that actually seems to be somehow related to crawler internal state as well.

@B4nan
Copy link
Member

B4nan commented Jan 21, 2025

Even if we rename it, this sounds like a bad idea to introduce something completely different with such a similar name. And I would personally not rename it. Its a very common thing to use, unlike this new method you are proposing here. I would rather pick a longer name for that one. getCrawlingStatus sounds fine'ish.

@janbuchar
Copy link
Collaborator

How does get_run_state sound?

@Pijukatel Pijukatel changed the title Add get_state method to BasicCrawler Add get_run_state method to BasicCrawler Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

3 participants