Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Self-Host] The CPU and memory resources are occupied too high, and after stopping the crawling task, the resources cannot be reduced. #722

Open
zhiweijie opened this issue Oct 1, 2024 · 2 comments
Assignees

Comments

@zhiweijie
Copy link

Describe the Issue
I deployed the V1 version on a VPS server, and it started running well at first, but after a while, the system resources would be overloaded, causing the firecrawl worker to stop accepting crawling tasks. I need to restart the container again to make it work normally. Is there something wrong with my configuration?

To Reproduce
Steps to reproduce the issue:

  1. Start the container
  2. Continuously fetch about 30 tasks
  3. Server memory from 13% to 85%
  4. Server memory cannot be restored to 13% after fetch job finished

Expected Behavior
Expected 2 cores and 4GB of resources should be sufficient.

Screenshots
CleanShot 2024-10-01 at 10 18 27@2x

Environment (please complete the following information):

  • OS: Debian OS 10
  • Firecrawl Version: V1.0.0
  • Node.js Version: -
  • Docker Version (if applicable): 26.1.4
  • Database Type and Version: --

Configuration
Only modified the default port, the rest is default configuration.

`name: firecrawl

x-common-service: &common-service
build: apps/api
networks:
- backend
environment:
- REDIS_URL=${REDIS_URL:-redis://redis:6379}
- REDIS_RATE_LIMIT_URL=${REDIS_URL:-redis://redis:6379}
- PLAYWRIGHT_MICROSERVICE_URL=${PLAYWRIGHT_MICROSERVICE_URL:-http://playwright-service:8721}
- USE_DB_AUTHENTICATION=${USE_DB_AUTHENTICATION}
- PORT=${PORT:-8722}
- NUM_WORKERS_PER_QUEUE=${NUM_WORKERS_PER_QUEUE}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- OPENAI_BASE_URL=${OPENAI_BASE_URL}
- MODEL_NAME=${MODEL_NAME:-gpt-4o}
- SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
- LLAMAPARSE_API_KEY=${LLAMAPARSE_API_KEY}
- LOGTAIL_KEY=${LOGTAIL_KEY}
- BULL_AUTH_KEY=${BULL_AUTH_KEY}
- TEST_API_KEY=${TEST_API_KEY}
- POSTHOG_API_KEY=${POSTHOG_API_KEY}
- POSTHOG_HOST=${POSTHOG_HOST}
- SUPABASE_ANON_TOKEN=${SUPABASE_ANON_TOKEN}
- SUPABASE_URL=${SUPABASE_URL}
- SUPABASE_SERVICE_TOKEN=${SUPABASE_SERVICE_TOKEN}
- SCRAPING_BEE_API_KEY=${SCRAPING_BEE_API_KEY}
- HOST=${HOST:-0.0.0.0}
- SELF_HOSTED_WEBHOOK_URL=${SELF_HOSTED_WEBHOOK_URL}
- LOGGING_LEVEL=${LOGGING_LEVEL}
extra_hosts:
- "host.docker.internal:host-gateway"

services:
playwright-service:
build: apps/playwright-service
environment:
- PORT=8721 #
- PROXY_SERVER=${PROXY_SERVER}
- PROXY_USERNAME=${PROXY_USERNAME}
- PROXY_PASSWORD=${PROXY_PASSWORD}
- BLOCK_MEDIA=${BLOCK_MEDIA}
networks:
- backend

api:
<<: *common-service
depends_on:
- redis
- playwright-service
ports:
- "8722:8722"
command: [ "pnpm", "run", "start:production" ]

worker:
<<: *common-service
depends_on:
- redis
- playwright-service
- api
command: [ "pnpm", "run", "workers" ]

redis:
image: redis:alpine
networks:
- backend
command: redis-server --bind 0.0.0.0

networks:
backend:
driver: bridge
`

@nickscamara
Copy link
Member

@mogery can you look into this when u get a chance?

@timkley
Copy link

timkley commented Dec 16, 2024

We use a similar setup, playwright seems to hog all resources sometimes (Only happened to my using the crawl endpoint). I can see about 20 processes with htop but they never seem to finish. Could be like a missing timeout causing the browsers to never terminate?

Our VPS also has 2 vCPU and 4GB of RAM, setup looks nearly identical to OP.

Let me know if additional logs would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants