Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cleanup purging behavior for CLI #729

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

vladfrangu
Copy link
Member

@vladfrangu vladfrangu commented Jan 15, 2025

Closes #590

BREAKING CHANGE:

Purging in projects using Crawlee for SDK v3+ now purge by default, so using --purge flag is no longer necessary. Use --no-purge to keep the storage folder intact.
The purge-queue, purge-dataset and purge-key-value-store flags have been removed, and the logic of all three was combined into the purge flag.

@vladfrangu vladfrangu requested a review from B4nan as a code owner January 15, 2025 19:12
@github-actions github-actions bot added this to the 106th sprint - Tooling team milestone Jan 15, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 15, 2025
Comment on lines -199 to -213
// TODO: deprecate these flags
if (this.flags.purgeQueue && !this.flags.purge) {
await purgeDefaultQueue();
info({ message: 'Default local request queue was purged.' });
}

if (this.flags.purgeDataset && !this.flags.purge) {
await purgeDefaultDataset();
info({ message: 'Default local dataset was purged.' });
}

if (this.flags.purgeKeyValueStore && !this.flags.purge) {
await purgeDefaultKeyValueStore();
info({ message: 'Default local key-value store was purged.' });
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hope this won't be missed by anyone 🙃 worst case we can add it back

btw I hope oclif wont silently ignore undeclared flags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will throw for unknown flags... Do we want to define them as hidden flags so it doesn't throw for users?

Code - 2025-01-16 at 11 33 22

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good, if they are not doing anything we need to tell potential users

src/lib/projects/ApifySDKAnalyzer.ts Outdated Show resolved Hide resolved
src/commands/run.ts Outdated Show resolved Hide resolved
CRAWLEE_PURGE_ON_START = '1';
}
// Crawlee does auto purging
if (projectTypes.includes(PROJECT_TYPES.CRAWLEE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this wont detect SDK v3 without crawlee explicitly installed, which should have this behaviour too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, SDK v3 usually also has crawlee defined as a dependency in some way, so it will match, no? But then again, if someone installs just SDK v3, and doesn't use crawlee at all, we wouldn't purge, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will catch transitive dependencies too? you can use the SDK without explicitly installing crawlee, that's my concern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and doesn't use crawlee at all, we wouldn't purge, no?

wdym? SDK v3 should behave just like crawlee project, it also autopurges

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use crawlee crawlers, otherwise they don't, no? iirc our purging logic only runs when starting a crawler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you dont have to use crawlers at all, same with JS variant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, we did it on open. eek, ok, will need to adjust the analyzer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the version for python SDK, or does that diverge from crawlee for storages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK v2 should work the same, thats where we used python crawlee internally (which is responsible for this)

@netmilk
Copy link
Contributor

netmilk commented Jan 16, 2025

Hi guys, I've just realized that if we want the CLI to behave consistently with how the Apify platform works, we shouldn't purge entirely by default, but instead the CLI should just keep adding local runs for later introspection and comparison. I know you're making it on par with Crawlee's behavior, but I think this will introduce further divergence.

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

I am not sure about that, regular actor developer runs, runs, and runs. They would end up with a lot of junk locally, I am not sure people would like such a default.

Also, I am not sure if this is a "divergence", we are not doing that now either, the CLI never considered multiple storage folders, neither the actual consumers of the storage folder (crawlee/sdk). We could surely just rename the existing storage folder instead of wiping it, but should that really be a default?

@B4nan B4nan changed the title refactor: cleanup purging behavior for CLI fix: cleanup purging behavior for CLI Jan 16, 2025
@netmilk
Copy link
Contributor

netmilk commented Jan 16, 2025

@B4nan Understood. Divergence from the platform behavior is what I'm concerned about. You run it there and the results always persist by default under a new run. But unifying the behavior of our local tooling is definitely a way to go.

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

@vladfrangu lets describe the BCs in the PR description

@vladfrangu
Copy link
Member Author

I also found out you can group flags in the help menu. Hows this look? @netmilk @B4nan

Code - 2025-01-16 at 11 51 25

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

nit: should be Node.js, not node.js

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

also i double checked with others, python works the same as JS, auto purge for crawlee or SDK (even without crawlee) v2+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up the purging logic for crawlee projects
3 participants