-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cleanup purging behavior for CLI #729
base: master
Are you sure you want to change the base?
Conversation
// TODO: deprecate these flags | ||
if (this.flags.purgeQueue && !this.flags.purge) { | ||
await purgeDefaultQueue(); | ||
info({ message: 'Default local request queue was purged.' }); | ||
} | ||
|
||
if (this.flags.purgeDataset && !this.flags.purge) { | ||
await purgeDefaultDataset(); | ||
info({ message: 'Default local dataset was purged.' }); | ||
} | ||
|
||
if (this.flags.purgeKeyValueStore && !this.flags.purge) { | ||
await purgeDefaultKeyValueStore(); | ||
info({ message: 'Default local key-value store was purged.' }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's hope this won't be missed by anyone 🙃 worst case we can add it back
btw I hope oclif wont silently ignore undeclared flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's good, if they are not doing anything we need to tell potential users
CRAWLEE_PURGE_ON_START = '1'; | ||
} | ||
// Crawlee does auto purging | ||
if (projectTypes.includes(PROJECT_TYPES.CRAWLEE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this wont detect SDK v3 without crawlee explicitly installed, which should have this behaviour too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, SDK v3 usually also has crawlee defined as a dependency in some way, so it will match, no? But then again, if someone installs just SDK v3, and doesn't use crawlee at all, we wouldn't purge, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this will catch transitive dependencies too? you can use the SDK without explicitly installing crawlee, that's my concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and doesn't use crawlee at all, we wouldn't purge, no?
wdym? SDK v3 should behave just like crawlee project, it also autopurges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use crawlee crawlers, otherwise they don't, no? iirc our purging logic only runs when starting a crawler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, you dont have to use crawlers at all, same with JS variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto purge is implemented on storage level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, we did it on open
. eek, ok, will need to adjust the analyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the version for python SDK, or does that diverge from crawlee for storages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK v2 should work the same, thats where we used python crawlee internally (which is responsible for this)
Hi guys, I've just realized that if we want the CLI to behave consistently with how the Apify platform works, we shouldn't purge entirely by default, but instead the CLI should just keep adding local runs for later introspection and comparison. I know you're making it on par with Crawlee's behavior, but I think this will introduce further divergence. |
I am not sure about that, regular actor developer runs, runs, and runs. They would end up with a lot of junk locally, I am not sure people would like such a default. Also, I am not sure if this is a "divergence", we are not doing that now either, the CLI never considered multiple storage folders, neither the actual consumers of the storage folder (crawlee/sdk). We could surely just rename the existing storage folder instead of wiping it, but should that really be a default? |
@B4nan Understood. Divergence from the platform behavior is what I'm concerned about. You run it there and the results always persist by default under a new run. But unifying the behavior of our local tooling is definitely a way to go. |
@vladfrangu lets describe the BCs in the PR description |
nit: should be |
also i double checked with others, python works the same as JS, auto purge for crawlee or SDK (even without crawlee) v2+ |
…wlee but has apify >=3
Closes #590
BREAKING CHANGE:
Purging in projects using Crawlee for SDK v3+ now purge by default, so using
--purge
flag is no longer necessary. Use--no-purge
to keep the storage folder intact.The
purge-queue
,purge-dataset
andpurge-key-value-store
flags have been removed, and the logic of all three was combined into thepurge
flag.