DiscovAI Crawl API 🕷️🔍

One API to scrape everything you need from URLs for your AI tool and vector database.

🚧 Work in Progress 🚧

🌟 Features

Our API provides a comprehensive suite of data extraction and processing capabilities:

🧼 Clean HTML (JavaScript and CSS removed)
📝 LLM-friendly Markdown conversion
🚫 Ad-free, cookie banner-free, and dialog-free content
📸 Website screenshots (auto-saved to AWS S3 or Cloudflare R2)
🤖 LLM-generated SEO-friendly content
🔑 LLM-extracted key information (summary, features, FAQs, etc.)
🧠 Ready-to-use embeddings for vector database integration (auto-saved to db)

🔧 Installation

pnpm i
cd apps/api && pnpm exec playwright install

🚀 Usage

pnpm dev
open http://localhost:3000

📦 API Response Structure

{
  "clean_html": "...",
  "LLM_friendly_markdown": "...",
  "clean_text": "...",
  "screenshot_url": "...",
  "llm_extracts_key_info": {
    "what": "...",
    "summary": "...",
    "features": ["...", "..."],
    "faqs": [{"q": "...", "a": "..."}]
  },
  "llm_summarized_detail": "...",
  "embeddings": [...]
}

📚 Documentation

TODO

🤝 Contributing

TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing