Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat summary memory buffer #13155

246 changes: 246 additions & 0 deletions docs/docs/examples/memory/ChatSummaryMemoryBuffer.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c62c1447-0afb-4fad-8dc6-389949c3496e",
"metadata": {},
"source": [
"# Chat Summary Memory Buffer\n",
"In this demo, we use the new *ChatSummaryMemoryBuffer* to limit the chat history to a certain token length, and iteratively summarize all messages that do not fit in the memory buffer. This can be useful if you want to limit costs and latency (assuming the summarization prompt uses and generates fewer tokens than including the entire history). The original *ChatMemoryBuffer* gives you the option to truncate the history after a certain number of tokens, which is useful to limit costs and latency, but also removes potentially relevant information from the chat history. The new *ChatSummaryMemoryBuffer* aims to makes this a bit more flexible, so the user has more control over which chat_history is retained."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c00a753b-df2c-4164-90c3-76b8a15f74c9",
"metadata": {},
"outputs": [],
"source": [
"%pip install llama-index-llms-openai\n",
"%pip install llama-index"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1f24186-b86e-4580-b7b4-072e719d424f",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db66e3bc-9791-497b-9e7d-386765dccf74",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.memory import ChatSummaryMemoryBuffer\n",
"from llama_index.core.llms import ChatMessage, MessageRole\n",
"from llama_index.llms.openai import OpenAI as OpenAiLlm\n",
"import tiktoken"
]
},
{
"cell_type": "markdown",
"id": "68e26f76-f819-4e1a-bc47-f2ea855ee189",
"metadata": {},
"source": [
"First, we simulate some chat history that will not fit in the memory buffer in its entirety."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6402621a-4131-465c-b92d-1d9a8e7ee985",
"metadata": {},
"outputs": [],
"source": [
"chat_history = [\n",
" ChatMessage(role=\"user\", content=\"What is LlamaIndex?\"),\n",
" ChatMessage(\n",
" role=\"assistant\",\n",
" content=\"LlamaaIndex is the leading data framework for building LLM applications\",\n",
" ),\n",
" ChatMessage(role=\"user\", content=\"Can you give me some more details?\"),\n",
" ChatMessage(\n",
" role=\"assistant\",\n",
" content=\"\"\"LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the following: \n",
" Question-Answering Chatbots (commonly referred to as RAG systems, which stands for \"Retrieval-Augmented Generation\"), Document Understanding and Extraction, Autonomous Agents that can perform research and take actions\n",
" LlamaIndex provides the tools to build any of these above use cases from prototype to production. The tools allow you to both ingest/process this data and implement complex query workflows combining data access with LLM prompting.\"\"\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "f057a791-9d8e-43e5-b40a-6675b28f6fd0",
"metadata": {},
"source": [
"We set up the summarizer_llm, and create a *ChatSummaryMemoryBuffer* instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67dbb654-4a66-43cf-9f6a-daead87a1084",
"metadata": {},
"outputs": [],
"source": [
"model = \"gpt-4-0125-preview\"\n",
"summarizer_llm = OpenAiLlm(model_name=model, max_tokens=256)\n",
"tokenizer_fn = tiktoken.encoding_for_model(model).encode\n",
"memory = ChatSummaryMemoryBuffer.from_defaults(\n",
" chat_history=chat_history,\n",
" summarizer_llm=summarizer_llm,\n",
" token_limit_full_text=2,\n",
" tokenizer_fn=tokenizer_fn,\n",
")\n",
"\n",
"history = memory.get()"
]
},
{
"cell_type": "markdown",
"id": "4e4e333c-6c33-4e01-b1a8-750d21800076",
"metadata": {},
"source": [
"When printing the history, we can observe that older messages have been summarized."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0821eb89-4164-4a06-b66c-ea2632706e11",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content='The user inquired about LlamaIndex, a leading data framework for developing LLM applications. The assistant explained that LlamaIndex is used for building context-augmented LLM applications, giving examples such as Question-Answering Chatbots, Document Understanding and Extraction, and Autonomous Agents. It was mentioned that LlamaIndex provides tools for ingesting and processing data, as well as implementing complex query workflows combining data access with LLM prompting.', additional_kwargs={})]\n"
]
}
],
"source": [
"print(history)"
]
},
{
"cell_type": "markdown",
"id": "fae3efe0-6889-49f7-9f21-6ced186f0609",
"metadata": {},
"source": [
"Let's add some new chat history."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ddb295c-c5c0-4faf-b0e5-a451d1d26d60",
"metadata": {},
"outputs": [],
"source": [
"new_chat_history = [\n",
" ChatMessage(role=\"user\", content=\"Why context augmentation?\"),\n",
" ChatMessage(\n",
" role=\"assistant\",\n",
" content=\"LLMs offer a natural language interface between humans and data. Widely available models come pre-trained on huge amounts of publicly available data. However, they are not trained on your data, which may be private or specific to the problem you're trying to solve. It's behind APIs, in SQL databases, or trapped in PDFs and slide decks. LlamaIndex provides tooling to enable context augmentation. A popular example is Retrieval-Augmented Generation (RAG) which combines context with LLMs at inference time. Another is finetuning.\",\n",
" ),\n",
" ChatMessage(role=\"user\", content=\"Who is LlamaIndex for?\"),\n",
" ChatMessage(\n",
" role=\"assistant\",\n",
" content=\"LlamaIndex provides tools for beginners, advanced users, and everyone in between. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. For more complex applications, our lower-level APIs allow advanced users to customize and extend any module—data connectors, indices, retrievers, query engines, reranking modules—to fit their needs.\",\n",
" ),\n",
"]\n",
"memory.put(new_chat_history[0])\n",
"memory.put(new_chat_history[1])\n",
"memory.put(new_chat_history[2])\n",
"memory.put(new_chat_history[3])\n",
"history = memory.get()"
]
},
{
"cell_type": "markdown",
"id": "9cedc25a-cf6c-45c2-977d-c3fef14f1ea5",
"metadata": {},
"source": [
"The history will now be updated with a new summary, containing the latest information."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e03428c-069f-4c17-9882-17f3fd0dcf4c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content='The user asked about LlamaIndex and why context augmentation is important. The assistant explained that LlamaIndex is for building context-augmented LLM applications, which are necessary because LLMs are not trained on specific private or problem-specific data. LlamaIndex provides tools for beginners and advanced users to ingest and query data easily, as well as customize modules for more complex applications.', additional_kwargs={})]\n"
]
}
],
"source": [
"print(history)"
]
},
{
"cell_type": "markdown",
"id": "f402643f-74ee-4c8f-902f-59296d4d8edf",
"metadata": {},
"source": [
"Using a longer *token_limit_full_text* allows the user to control the balance between retaining the full chat history and summarization."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91d34b9b-04c3-4453-85ab-8afe59a7e763",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content='The user inquired about LlamaIndex, and the assistant explained that it is a data framework for creating context-augmented LLM applications. These applications can include question-answering chatbots, document understanding and extraction, and autonomous agents for research and actions. LlamaIndex provides tools for building these applications from prototype to production, enabling data processing, complex query workflows, and LLM prompting.', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='Why context augmentation?', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=\"LLMs offer a natural language interface between humans and data. Widely available models come pre-trained on huge amounts of publicly available data. However, they are not trained on your data, which may be private or specific to the problem you're trying to solve. It's behind APIs, in SQL databases, or trapped in PDFs and slide decks. LlamaIndex provides tooling to enable context augmentation. A popular example is Retrieval-Augmented Generation (RAG) which combines context with LLMs at inference time. Another is finetuning.\", additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='Who is LlamaIndex for?', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='LlamaIndex provides tools for beginners, advanced users, and everyone in between. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. For more complex applications, our lower-level APIs allow advanced users to customize and extend any module—data connectors, indices, retrievers, query engines, reranking modules—to fit their needs.', additional_kwargs={})]\n"
]
}
],
"source": [
"memory = ChatSummaryMemoryBuffer.from_defaults(\n",
" chat_history=chat_history + new_chat_history,\n",
" summarizer_llm=summarizer_llm,\n",
" token_limit_full_text=256,\n",
" tokenizer_fn=tokenizer_fn,\n",
")\n",
"print(memory.get())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama-index-py3.9",
"language": "python",
"name": "llama-index-py3.9"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
3 changes: 2 additions & 1 deletion llama-index-core/llama_index/core/memory/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from llama_index.core.memory.chat_memory_buffer import ChatMemoryBuffer
from llama_index.core.memory.chat_summary_memory_buffer import ChatSummaryMemoryBuffer
from llama_index.core.memory.types import BaseMemory

__all__ = ["BaseMemory", "ChatMemoryBuffer"]
__all__ = ["BaseMemory", "ChatMemoryBuffer", "ChatSummaryMemoryBuffer"]