Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playwright requires installation via npx playwright install #2275

Open
1 task
mnmkng opened this issue Jan 8, 2024 · 18 comments
Open
1 task

Playwright requires installation via npx playwright install #2275

mnmkng opened this issue Jan 8, 2024 · 18 comments
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@mnmkng
Copy link
Member

mnmkng commented Jan 8, 2024

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Issue description

When you install a project with the following package.json it fails on first start asking to npx install playwright.

It's not a great first experience to get a huge error on first run, so we should either:

  1. ensure that Playwright browsers are installed together with @crawlee/playwright or
  2. document everywhere, most importantly on the Crawlee homepage, that this command needs to be run before Playwright can be started.

It's likely that to reproduce this, you first need to npx playwright uninstall to get into a "new user state".

This probably also impacts all our CLI templates.

Code sample

{
    "name": "my-module",
    "version": "0.0.1",
    "dependencies": {
        "crawlee": "^3.0.0",
        "playwright": "*"
    },
    "type": "module",
    "scripts": {
        "start": "node main.js"
    },
    "author": "Me!"
}

Package version

3.7.1

Node.js version

v18.12.1

Operating system

MacOS

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

no

Other context

No response

@mnmkng mnmkng added the bug Something isn't working. label Jan 8, 2024
@B4nan
Copy link
Member

B4nan commented Jan 8, 2024

So you can reproduce this from some template, or by installing crawlee into an empty project? Because the templates are working fine on my end.

I believe the browsers are installed via postinstall hook nowadays, cc @vladfrangu

@vladfrangu
Copy link
Member

vladfrangu commented Jan 8, 2024

Yep, both apify and crawlee templates have a postinstall hook (that also ensures it won't run in our docker images, but will run everywhere else)

We should probably document the CLI command to users who are upgrading to newer playwright or are making new projects without our CLI. Could even just make a command in CLI to auto fix old projects (npx crawlee migrate-new-playwright?)

@mnmkng
Copy link
Member Author

mnmkng commented Jan 8, 2024

Hmm, probably not worth introducing a new command just to wrap an existing playwright command that's documented in the error.

So all the "default" and "new user" paths of installing crawlee are covered with this then? And I was just unlucky because I reinstalled an old project?

@vladfrangu
Copy link
Member

This is fixed for any users who create their project via apify create or crawlee create.. Otherwise, the postinstall hook needs to be added into the project (which is why I suggested making a cmd for it, to automate it for users)

@B4nan
Copy link
Member

B4nan commented Jan 8, 2024

Cant we have it on the @crawlee/playwright package?

@vladfrangu
Copy link
Member

Well...we install the package all the time, so running the command when people don't use playwright isn't ideal either... Not sure what the best solution is

@B4nan
Copy link
Member

B4nan commented Jan 8, 2024

Hmm but in the end, we want this to work with the crawlee package too, same for puppeteer. The browsers used to be installed before too, right?

@B4nan
Copy link
Member

B4nan commented Jan 8, 2024

Can we have some env var to skip the downloads in the postinstall script? I'd probably just install them all the time and allow opting out, that was the previous behavior before all this mess happened.

@B4nan B4nan added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 15, 2024
@RowanAldean
Copy link

I am getting the dreaded:

╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers:              ║
║                                                                         ║
║     npx playwright install                                              ║
║                                                                         ║
║ <3 Playwright Team                                                      ║
╚═════════════════════════════════════════════════════════════════════════╝

With the only code change being adding a new express route. I also defined 2 request queues following some internet skim reading. Running locally everything works as expected, but this issue is occuring via GCP Cloud Run.

I am using apify/actor-node-playwright-chrome:18 in my Dockerfile.

My logs show this error:

browserType.launchPersistentContext: Executable doesn't exist at /home/myuser/pw-browsers/chromium-1091/chrome-linux/chrome

and having pulled down the image and running locally via docker I can confirm that the only browsers present in pw-browsers are the following:

# cd pw-browsers
# ls
chrome  chromium-1097  ffmpeg-1009

New route:

app.get("/lemon", async (req, res, message) => {
  
  const targetLink = req.query.link;
  if (!targetLink) {
    throw new Error('The link query parameter is required in order to know which lemon to crawl.');
  }

  const startUrl = `${targetLink}`;
  console.log(`We've received the lemon to crawl as: ${startUrl}`);


  const crawler = new PlaywrightCrawler(
    {
      requestHandler: router.getHandler('TANGY_LEMON'),
      minConcurrency: 5,
      requestQueue: lemonRequestQueue
    },
    new Configuration({
      persistStorage: false,
    })
  );

  await crawler.run([startUrl]);

  const crawlerOutput = await crawler.getData();  

  return res.send(crawlerOutput);
});

Any advice on how to resolve or if this is unrelated would be amazing. Before I had simply followed the documentation instructions with a top-level express.js route. I am using a specific handler needed only for lemon as the top-level route is scraping a more broad tree of pages where the final outcome is lemon but I need to be able to request a specific crawl of a lemon using my route. Also please don't bully the choice of a query param here, quick & lazy was the thought.

@vladfrangu
Copy link
Member

Sounds like your playwright version doesn't match the one we use when building images. You should specify it in the image version tag (so you'd have apify/actor-node-playwright-chrome:18-1.40.0 for playwright 1.40.0 as an example! That should solve the issue, but please follow up if it doesn't

@RowanAldean
Copy link

In fairness, I was using a wildcard for the playwright version in my package.json - I fixed it to ^1.40.0 as is the case for @playwright/test and still not resolved :(

Error message is the same as above regarding missing browser chromium-1091

@vladfrangu
Copy link
Member

If you use a range like that it'll still install the latest version that matches, you'd need to either use ~ for the range or a fixed version 😅

If you're able to make a reproducible sample in a repository that'd help a bunch too!

@RowanAldean
Copy link

I will move to relevant thread as this relates to, having narrowed down the problem to the Dockerfile or atleast this element of my pipeline.

Confirmed by simple rebuilding and redeploying an unchanged project (i.e expected to be the equivalent to a rollback) and still getting the same error around the lack of that specific browser chromium-1091. I will now try pinning version of playwright or using latest apify docker image or both (please don't make me create and serve my own base image... such overkill suggested in above thread by other user).

@hengliu0919
Copy link

I am getting same error

2024-06-30T20:00:31.110Z Error occurred browserType.launch: Executable doesn't exist at /home/myuser/pw-browsers/chromium-1117/chrome-linux/chrome
2024-06-30T20:00:31.112Z ╔═════════════════════════════════════════════════════════════════════════╗
2024-06-30T20:00:31.114Z ║ Looks like Playwright Test or Playwright was just installed or updated. ║
2024-06-30T20:00:31.115Z ║ Please run the following command to download new browsers:              ║
2024-06-30T20:00:31.117Z ║                                                                         ║
2024-06-30T20:00:31.120Z ║     npx playwright install                                              ║
2024-06-30T20:00:31.121Z ║                                                                         ║
2024-06-30T20:00:31.123Z ║ <3 Playwright Team                                                      ║
2024-06-30T20:00:31.125Z ╚═════════════════════════════════════════════════════════════════════════╝
2024-06-30T20:00:31.126Z     at scheduleLadder (/home/myuser/dist/main.js:295:34)
2024-06-30T20:00:31.128Z     at main (/home/myuser/dist/main.js:375:26)
2024-06-30T20:00:31.130Z     at /home/myuser/async file:/home/myuser/dist/main.js:378:1 {
2024-06-30T20:00:31.133Z   name: 'Error'
2024-06-30T20:00:31.135Z }

@n-sviridenko
Copy link

Same here

@al6x
Copy link

al6x commented Nov 21, 2024

Same problem, I installed crawlee as bun install install crawlee playwright and it fails wit this error.

@mnmkng
Copy link
Member Author

mnmkng commented Dec 30, 2024

@B4nan should we close this, because it's been solved and a reoccurrence means wrong configuration, or is this still an issue?

@B4nan
Copy link
Member

B4nan commented Dec 30, 2024

I guess we are still missing proper docs around this, so let's keep it open for a bit more. I also wanted to pin the browser versions in docker tags in templates and polish all of that finally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

7 participants