Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: FireCrawlLoader - Got exception due to failed crawl job but it was indeed a success #27063

Open
5 tasks done
bytrangle opened this issue Oct 3, 2024 · 2 comments
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@bytrangle
Copy link

bytrangle commented Oct 3, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.document_loaders import FireCrawlLoader
loader = FireCrawlLoader(
  url="https://firecrawl.dev",
  mode="crawl",
)
docs = loader.load()
docs[0]

I have set environment variable for FIRECRAWL_API_KEY

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
File "/home/thoa/Documents/dev/demos/firecrawl/chat-with-website.py", line 23, in
for doc in docs_lazy:
File "/home/thoa/.local/lib/python3.10/site-packages/langchain_community/document_loaders/firecrawl.py", line 112, in lazy_load
firecrawl_docs = self.firecrawl.crawl_url(self.url, params=self.params)
File "/home/thoa/.local/lib/python3.10/site-packages/firecrawl/firecrawl.py", line 133, in crawl_url
return self._monitor_job_status(id, headers, poll_interval)
File "/home/thoa/.local/lib/python3.10/site-packages/firecrawl/firecrawl.py", line 360, in _monitor_job_status
raise Exception(f'Crawl job failed or was stopped. Status: {status_data["status"]}')
Exception: Crawl job failed or was stopped. Status: failed

Description

I'm trying to use FireCrawlLoader to crawl a website. I should get a printed out put like:

Document(metadata={'ogUrl': 'https://www.firecrawl.dev/', 'title': 'Home - Firecrawl', 'robots': 'follow, index', 'ogImage': 'https://www.firecrawl.dev/og.png?123', 'ogTitle': 'Firecrawl', 'sitemap': {'lastmod': '2024-08-12T00:28:16.681Z', 'changefreq': 'weekly'}, 'keywords': 'Firecrawl,Markdown,Data,Mendable,Langchain', 'sourceURL': 'https://www.firecrawl.dev/', 'ogSiteName': 'Firecrawl', 'description': 'Firecrawl crawls and converts any website into clean markdown.' ...)

Instead, I got an error that the crawl job failed or was stopped but I checked the Activity Logs in FireCrawl and the craw was a success.

The error can be traced to the function monitor_job_status in FireCrawl's Python SDK. I'm not sure if there is bug in FireCrawl integration in Langchain, or FireCrawl's Python SDK.

System Info

System Information

OS: Linux
OS Version: #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022
Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Package Information

langchain_core: 0.3.6
langchain: 0.3.1
langchain_community: 0.3.1
langsmith: 0.1.129
langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.6
async-timeout: 4.0.3
dataclasses-json: 0.6.7
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
orjson: 3.10.7
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.5.2
PyYAML: 5.4.1
requests: 2.25.1
SQLAlchemy: 2.0.35
tenacity: 8.5.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 3, 2024
@FarhanChowdhury248
Copy link

@bytrangle I believe this issue is due to mendableai/firecrawl#720. You should be able to resolve it by using the workaround provided in the discussion or updating to include the associated PR fix. The former works in my case.

@rafaelsideguide
Copy link
Contributor

Closed by #26548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants