forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: elasticsearch (langchain-ai#2402)
- Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes langchain-ai#2386
- Loading branch information
Showing
7 changed files
with
186 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,3 +141,4 @@ wandb/ | |
|
||
# asdf tool versions | ||
.tool-versions | ||
/.ruff_cache/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
30 changes: 30 additions & 0 deletions
30
tests/integration_tests/vectorstores/docker-compose/elasticsearch.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
version: "3" | ||
|
||
services: | ||
elasticsearch: | ||
image: docker.elastic.co/elasticsearch/elasticsearch:8.7.0 | ||
environment: | ||
- discovery.type=single-node | ||
- xpack.security.enabled=false | ||
- xpack.security.http.ssl.enabled=false | ||
- ELASTIC_PASSWORD=password | ||
ports: | ||
- "9200:9200" | ||
healthcheck: | ||
test: [ "CMD-SHELL", "curl --silent --fail http://localhost:9200/_cluster/health || exit 1" ] | ||
interval: 1s | ||
retries: 360 | ||
|
||
kibana: | ||
image: docker.elastic.co/kibana/kibana:8.7.0 | ||
environment: | ||
- ELASTICSEARCH_URL=http://elasticsearch:9200 | ||
- ELASTICSEARCH_USERNAME=kibana_system | ||
- ELASTICSEARCH_PASSWORD=password | ||
- KIBANA_PASSWORD=password | ||
ports: | ||
- "5601:5601" | ||
healthcheck: | ||
test: [ "CMD-SHELL", "curl --silent --fail http://localhost:5601/login || exit 1" ] | ||
interval: 10s | ||
retries: 60 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Sharks are a group of elasmobranch fish characterized by a cartilaginous skeleton, five to seven gill slits on the sides of the head, and pectoral fins that are not fused to the head. Modern sharks are classified within the clade Selachimorpha (or Selachii) and are the sister group to the Batoidea (rays and kin). Some sources extend the term "shark" as an informal category including extinct members of Chondrichthyes (cartilaginous fish) with a shark-like morphology, such as hybodonts and xenacanths. Shark-like chondrichthyans such as Cladoselache and Doliodus first appeared in the Devonian Period (419-359 Ma), though some fossilized chondrichthyan-like scales are as old as the Late Ordovician (458-444 Ma). The oldest modern sharks (selachians) are known from the Early Jurassic, about 200 Ma. | ||
|
||
Sharks range in size from the small dwarf lanternshark (Etmopterus perryi), a deep sea species that is only 17 centimetres (6.7 in) in length, to the whale shark (Rhincodon typus), the largest fish in the world, which reaches approximately 12 metres (40 ft) in length. They are found in all seas and are common to depths up to 2,000 metres (6,600 ft). They generally do not live in freshwater, although there are a few known exceptions, such as the bull shark and the river shark, which can be found in both seawater and freshwater.[3] Sharks have a covering of dermal denticles that protects their skin from damage and parasites in addition to improving their fluid dynamics. They have numerous sets of replaceable teeth. | ||
|
||
Several species are apex predators, which are organisms that are at the top of their food chain. Select examples include the tiger shark, blue shark, great white shark, mako shark, thresher shark, and hammerhead shark. | ||
|
||
Sharks are caught by humans for shark meat or shark fin soup. Many shark populations are threatened by human activities. Since 1970, shark populations have been reduced by 71%, mostly from overfishing. |
152 changes: 130 additions & 22 deletions
152
tests/integration_tests/vectorstores/test_elasticsearch.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,137 @@ | ||
"""Test ElasticSearch functionality.""" | ||
import logging | ||
import os | ||
from typing import Generator, List, Union | ||
|
||
import pytest | ||
from elasticsearch import Elasticsearch | ||
|
||
from langchain.docstore.document import Document | ||
from langchain.document_loaders import TextLoader | ||
from langchain.embeddings import OpenAIEmbeddings | ||
from langchain.text_splitter import CharacterTextSplitter | ||
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch | ||
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings | ||
|
||
logging.basicConfig(level=logging.DEBUG) | ||
|
||
""" | ||
cd tests/integration_tests/vectorstores/docker-compose | ||
docker-compose -f elasticsearch.yml up | ||
""" | ||
|
||
|
||
class TestElasticsearch: | ||
@pytest.fixture(scope="class", autouse=True) | ||
def elasticsearch_url(self) -> Union[str, Generator[str, None, None]]: | ||
"""Return the elasticsearch url.""" | ||
url = "http://localhost:9200" | ||
yield url | ||
es = Elasticsearch(hosts=url) | ||
|
||
# Clear all indexes | ||
index_names = es.indices.get(index="_all").keys() | ||
for index_name in index_names: | ||
# print(index_name) | ||
es.indices.delete(index=index_name) | ||
|
||
@pytest.fixture(scope="class", autouse=True) | ||
def openai_api_key(self) -> Union[str, Generator[str, None, None]]: | ||
"""Return the OpenAI API key.""" | ||
openai_api_key = os.getenv("OPENAI_API_KEY") | ||
if not openai_api_key: | ||
raise ValueError("OPENAI_API_KEY environment variable is not set") | ||
|
||
yield openai_api_key | ||
|
||
@pytest.fixture(scope="class") | ||
def documents(self) -> Generator[List[Document], None, None]: | ||
"""Return a generator that yields a list of documents.""" | ||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) | ||
|
||
documents = TextLoader( | ||
os.path.join(os.path.dirname(__file__), "fixtures", "sharks.txt") | ||
).load() | ||
yield text_splitter.split_documents(documents) | ||
|
||
def test_similarity_search_without_metadata(self, elasticsearch_url: str) -> None: | ||
"""Test end to end construction and search without metadata.""" | ||
texts = ["foo", "bar", "baz"] | ||
docsearch = ElasticVectorSearch.from_texts( | ||
texts, FakeEmbeddings(), elasticsearch_url=elasticsearch_url | ||
) | ||
output = docsearch.similarity_search("foo", k=1) | ||
assert output == [Document(page_content="foo")] | ||
|
||
def test_similarity_search_with_metadata(self, elasticsearch_url: str) -> None: | ||
"""Test end to end construction and search with metadata.""" | ||
texts = ["foo", "bar", "baz"] | ||
metadatas = [{"page": i} for i in range(len(texts))] | ||
docsearch = ElasticVectorSearch.from_texts( | ||
texts, | ||
FakeEmbeddings(), | ||
metadatas=metadatas, | ||
elasticsearch_url=elasticsearch_url, | ||
) | ||
output = docsearch.similarity_search("foo", k=1) | ||
assert output == [Document(page_content="foo", metadata={"page": 0})] | ||
|
||
def test_default_index_from_documents( | ||
self, documents: List[Document], openai_api_key: str, elasticsearch_url: str | ||
) -> None: | ||
"""This test checks the construction of a default | ||
ElasticSearch index using the 'from_documents'.""" | ||
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key) | ||
|
||
elastic_vector_search = ElasticVectorSearch.from_documents( | ||
documents=documents, | ||
embedding=embedding, | ||
elasticsearch_url=elasticsearch_url, | ||
) | ||
|
||
search_result = elastic_vector_search.similarity_search("sharks") | ||
|
||
print(search_result) | ||
assert len(search_result) != 0 | ||
|
||
def test_custom_index_from_documents( | ||
self, documents: List[Document], openai_api_key: str, elasticsearch_url: str | ||
) -> None: | ||
"""This test checks the construction of a custom | ||
ElasticSearch index using the 'from_documents'.""" | ||
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key) | ||
elastic_vector_search = ElasticVectorSearch.from_documents( | ||
documents=documents, | ||
embedding=embedding, | ||
elasticsearch_url=elasticsearch_url, | ||
index_name="custom_index", | ||
) | ||
es = Elasticsearch(hosts=elasticsearch_url) | ||
index_names = es.indices.get(index="_all").keys() | ||
assert "custom_index" in index_names | ||
|
||
search_result = elastic_vector_search.similarity_search("sharks") | ||
print(search_result) | ||
|
||
assert len(search_result) != 0 | ||
|
||
def test_custom_index_add_documents( | ||
self, documents: List[Document], openai_api_key: str, elasticsearch_url: str | ||
) -> None: | ||
"""This test checks the construction of a custom | ||
ElasticSearch index using the 'add_documents'.""" | ||
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key) | ||
elastic_vector_search = ElasticVectorSearch( | ||
embedding=embedding, | ||
elasticsearch_url=elasticsearch_url, | ||
index_name="custom_index", | ||
) | ||
es = Elasticsearch(hosts=elasticsearch_url) | ||
index_names = es.indices.get(index="_all").keys() | ||
assert "custom_index" in index_names | ||
|
||
elastic_vector_search.add_documents(documents) | ||
search_result = elastic_vector_search.similarity_search("sharks") | ||
print(search_result) | ||
|
||
def test_elasticsearch() -> None: | ||
"""Test end to end construction and search.""" | ||
texts = ["foo", "bar", "baz"] | ||
docsearch = ElasticVectorSearch.from_texts( | ||
texts, FakeEmbeddings(), elasticsearch_url="http://localhost:9200" | ||
) | ||
output = docsearch.similarity_search("foo", k=1) | ||
assert output == [Document(page_content="foo")] | ||
|
||
|
||
def test_elasticsearch_with_metadatas() -> None: | ||
"""Test end to end construction and search.""" | ||
texts = ["foo", "bar", "baz"] | ||
metadatas = [{"page": i} for i in range(len(texts))] | ||
docsearch = ElasticVectorSearch.from_texts( | ||
texts, | ||
FakeEmbeddings(), | ||
metadatas=metadatas, | ||
elasticsearch_url="http://localhost:9200", | ||
) | ||
output = docsearch.similarity_search("foo", k=1) | ||
assert output == [Document(page_content="foo", metadata={"page": 0})] | ||
assert len(search_result) != 0 |