-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add HTML page Loader #54
Comments
Not sure I understand. What is the use case? |
i was hoping there was a HTML text extraction method so meta can be added before passing it into the splitter service. I might have been thing this was all baked into langchain. If i'm wrong, by all means close this. The idea was to add supporting data / meta into the html doc tat's been beutifully souped ,and then pass it into be split after pre processing via one of my own cstom services. |
OK i have a user case for this. Two types of website, static html, you can add web urls and loaders. All fine. Some dynamic pages have ajax or scripts that run to generate content or reguire login. The second types don't render pages imediately and quite a few websites out there have this have this (CRM backends, Salesforce, Ajax etc etc). So the pages show but there is a delay in rendering content in that page.. Point is this
Just sharing my thoughts with everyone! |
In langchain,
This will cover the situation of dynamic rendered pages, rather than assuming all pages are static which the webLoader does. many thanks |
not the same as add WebLoader, but LangChain has this hook to pass in a HTML page content, and some other settings. Under certain conditions this could be very userful to customize the HTML content or pre-process it before passing it to the loader
The text was updated successfully, but these errors were encountered: