community: add MarkItDown document loader #28960
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This pull request introduces a new document loader,
MarkItDownLoader
, to thelangchain_community
library. The changes include adding the new loader to the appropriate initialization and attribute handling files, implementing the loader class, and creating a test for the loader.New document loader implementation:
libs/community/langchain_community/document_loaders/__init__.py
: Added import and registration forMarkItDownLoader
in the document loaders module. [1] [2] [3]libs/community/langchain_community/document_loaders/markitdown.py
: Implemented theMarkItDownLoader
class, which uses theMarkItDown
library to load files as Markdown documents.Testing:
libs/community/tests/integration_tests/document_loaders/test_markitdown_loader.py
: Added an integration test for theMarkItDownLoader
to ensure it correctly loads and processes documents.Issue: Discussion #28958
Dependencies:
markitdown
TODO:
docs/docs/integrations
directory.