Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor.getInput returns null #586

Closed
tugkan opened this issue Jul 2, 2024 · 12 comments · Fixed by #591
Closed

Actor.getInput returns null #586

tugkan opened this issue Jul 2, 2024 · 12 comments · Fixed by #591
Assignees
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@tugkan
Copy link

tugkan commented Jul 2, 2024

After the latest release, the await Actor.getInput() functionality got broken. It returns null and cannot retrieve the input field properly.

apify run -p command breaks the getInput function. npm run src/main.jsreturns properly. Possibly some lib clash is happening.

Apify CLI version: apify-cli/0.20.0 darwin-arm64 node-v18.17.1
apify lib version: 3.2.3
crawlee version: 3.5.4

Also, I tried scaffolding the project by using the apify CLI tool but can reproduce the error still.

@B4nan
Copy link
Member

B4nan commented Jul 2, 2024

cc @vladfrangu, sounds like we now wipe the INPUT.json file from KVS too, we need to keep that one.

@B4nan
Copy link
Member

B4nan commented Jul 2, 2024

FYI @tugkan you should no longer need the -p flag, in fact it wasn't needed since v3, crawlee purges the default storages automatically now.

@tugkan
Copy link
Author

tugkan commented Jul 2, 2024

@B4nan Thank you.

I might be wrong but I recently simulated the migration event by running apify run. If that's the case, how we can test out the migration in the local runs?

@B4nan
Copy link
Member

B4nan commented Jul 2, 2024

How were you doing that? I wasn't aware apify run could do such thing.

If you press ctrl + c crawlee will simulate migration - it will run the same handler as if the migrating event is fired. You don't need the Apify CLI to run it really these days (as long as you have the proxy password in your env vars).

https://github.com/apify/crawlee/blob/master/packages/basic-crawler/src/internals/basic-crawler.ts#L897

@vladfrangu vladfrangu self-assigned this Jul 2, 2024
@tugkan
Copy link
Author

tugkan commented Jul 2, 2024

@B4nan It is not a direct simulation but we were doing something like this:

  1. Run the actor by executing apify run -p as usual
  2. ctrl + c to stop the process.
  3. Execute apify run to re-run the process without purging Request Queues or KVs.

@B4nan
Copy link
Member

B4nan commented Jul 2, 2024

Yeah, but that's crawlee doing this, not the CLI. Alternative to this is using npm start instead of apify run -p and CRAWLEE_PURGE_ON_START=0 npm start to restart. Crawlee also provides a CLI which you can use for the same via npx crawlee run --no-purge, all it does is swapping the CRAWLEE_PURGE_ON_START env var, just like the Apify CLI.

https://github.com/apify/apify-cli/blob/master/src/commands/run.ts#L149-L164
https://github.com/apify/crawlee/blob/master/packages/cli/src/commands/RunProjectCommand.ts#L34-L36

@tugkan
Copy link
Author

tugkan commented Jul 2, 2024

@B4nan Gotcha! Thank you.

@fnesveda fnesveda added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 3, 2024
@mvolfik
Copy link
Collaborator

mvolfik commented Jul 4, 2024

+1, also hit by this. Additionally, this is printed before the apify run -p process ends:

INFO  CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}
    Error: ENOENT: no such file or directory, stat '/tmp/test34/storage/key_value_stores/default/INPUT_CLI-1720097672747.json'
    Code: ENOENT

However, running without -p does not work for me either, all the data remains there. Easily verifiable by adding

const ds = await Actor.openDataset('default');
console.log(await ds.getData());

right after Actor.init() in a new project, created by

14:59:05.212 volfmatej@flyer /tmp> apify create test35
? Choose the programming language of your new Actor: TypeScript
? Choose a template [...]: Crawlee + Cheerio
? Do you want to install the following template?
 Crawlee + Cheerio:
 A scraper example that uses Cheerio to parse HTML. It's fast, but it can't run the website's JavaScript or pass JS anti-scraping challenges. Install template

Imo this is quite serious breakage - neither apify run, nor apify run -p works. Downgrading to 0.19.5 fixes this issue, but the store is still not being purged by apify run (no -p)


Crawlee 3.10.5, cli 0.20.0

@mvolfik
Copy link
Collaborator

mvolfik commented Jul 4, 2024

FYI @tugkan you should no longer need the -p flag, in fact it wasn't needed since v3, crawlee purges the default storages automatically now.

let CRAWLEE_PURGE_ON_START = '0';
// Purge stores
// TODO: this needs to be cleaned up heavily - ideally logic should be in the project analyzers
if (this.flags.purge) {
switch (projectType) {
case PROJECT_TYPES.PRE_CRAWLEE_APIFY_SDK: {
await Promise.all([purgeDefaultQueue(), purgeDefaultKeyValueStore(), purgeDefaultDataset()]);
info({ message: 'All default local stores were purged.' });
break;
}
case PROJECT_TYPES.CRAWLEE:
default: {
CRAWLEE_PURGE_ON_START = '1';
}
}
this really does sound like the default is no-purge, unless the purge flag is enabled

@vladfrangu
Copy link
Member

That is correct right now, and will be cleaned up proper once #590 is done

@mvolfik
Copy link
Collaborator

mvolfik commented Jul 4, 2024

aha, I see (now I vaguely remember I was already reporting something about that)

But if this is the case, then let's not reply to issues that -p is not needed, since that is the default behavior, when that is currently not true.

So until #591 is released, the two options that work as they should (and will stay the same after #590) are:

Is that correct?

@B4nan
Copy link
Member

B4nan commented Jul 4, 2024

or running env APIFY_TOKEN= npm start?

I think it needs to be APIFY_PROXY_PASSWORD, not just the token, but I might be wrong about this bit. I know it does work with the proxy password, that's how I run things since we shipped crawlee.

Btw I was saying -p is not needed because that was how it was supposed to be working for quite some time, but as you pointed out, it is not the case. We plan to change this behavior (align it with how crawlee works) and ship 0.21 next week, where it will work this way.

The apify run -p will be fixed in a few minutes in v0.20.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants