Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Self-Host] Max retries local error #930

Open
aanokh opened this issue Nov 28, 2024 · 1 comment
Open

[Self-Host] Max retries local error #930

aanokh opened this issue Nov 28, 2024 · 1 comment

Comments

@aanokh
Copy link

aanokh commented Nov 28, 2024

I am trying to self-host Firecrawl, but am running into a weird error. Here is my code:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="hi", api_url="http://localhost:3002")

#Crawl a website:
crawl_status = app.crawl_url(
  'https://www.scu.edu/engineering/', 
  params={
    'limit': 500, 
    'scrapeOptions': {'formats': ['markdown', 'html']}
  },
  poll_interval=100
)
print(crawl_status)

I get the following error after it runs for a bit:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='localhost', port=3002): Max retries exceeded with url: /v1/crawl/8db68f10-fbf2-4911-a806-8faa06f478d2?skip=221 (Caused by SSLError(SSLError(1, '[SSL] record layer failure (_ssl.c:1020)')))

If I decrease the limit to 100 pages, everything works fine. Any tips on how to fix this? Thank you!

@mogery
Copy link
Member

mogery commented Dec 16, 2024

This is because the next URL returned by the crawl status API always says https. Working on it.

@mogery mogery self-assigned this Dec 16, 2024
@linear linear bot added the Improvement label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants