Add chat summary memory buffer #13155

shadyelgewily-slimstock · 2024-04-29T10:10:47Z

Description

This PR introduces a new ChatSummaryMemoryBuffer to limit the chat history to a certain token length, and iteratively summarize all messages that do not fit in the memory buffer. This can be useful if you want to limit costs and latency (assuming the summarization prompt uses and generates fewer tokens than including the entire history). The original ChatMemoryBuffer gives you the option to truncate the history after a certain number of tokens, which is useful to limit costs and latency, but also removes potentially relevant information from the chat history. The new ChatSummaryMemoryBuffer aims to makes this a bit more flexible, so the user has more control over which chat_history is retained.

Fixes #12845

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

[X ] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added new unit/integration tests
Added new notebook (that tests end-to-end)
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

review-notebook-app · 2024-04-29T10:10:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

shadyelgewily-slimstock · 2024-04-29T10:15:13Z

Limitations:

from_string() and from_dict() are not implemented, because we need the summarizer_llm
For the same reason, I have not tested pickling

Future improvements:

It probably makes sense to consider the last N interactions (i.e., user/assistant pairs), instead of a token limit. This can be added as an option.
Default summarization prompt may be improved
Perhaps we should add a list of ChatMessages[] to the LLM instead of constructing a single string, I'm not sure whether this makes any difference.

Josephrp · 2024-04-29T13:52:03Z

this is great, arguably chat memory classes are lacking in llamaindex and implementing various chat memory approaches will add value and flexibility to users, also for testing their application.

logan-markewich · 2024-04-29T15:47:27Z

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

+SUMMARIZE_PROMPT = "The following is a conversation between the user and assistant. Write a concise summary about the contents of this conversation."
+
+
+# TODO: Add option for last N user/assistant history interactions instead of token limit


side note, slightly off topic but: does N messages make sense? What if a single user or assistant message is huge? Also another consideration, for function calls, openai (and others) require pairs of an assistant and tool message to be in memory

If a single user or assistant message is huge, then indeed, this is not such a great idea. On the other hand, the user has knowledge about the prompts and the expected length of the responses (perhaps something along the lines of "Answer in a single sentence: what is LlamaIndex?") and it could be a use case to consider only the last 3 or 5 "interactions", say.

The reasons I considered it is to ensure that USER/ASSISTANT messages are always considered in pairs, and to give some more flexibility to the end user. But it's more of a nice to have for the future if there is appetite for that flexibility.

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

logan-markewich · 2024-04-29T15:55:51Z

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

+ def from_dict(
+ cls, data: Dict[str, Any], **kwargs: Any
+ ) -> "ChatSummaryMemoryBuffer":
+ raise NotImplementedError("This is not yet supported.")


from_dict() / from_string() could be implemented by just passing in the LLM when calling it?

I'm not sure if I understood you correctly, but I added a basic implementation. If this is not what you are looking for, could you make the code change?

logan-markewich · 2024-04-29T15:58:31Z

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py

+ <= self.token_limit_full_text
+ ):
+ # traverse the history in reverse order, when token limit is about to be
+ # exceeded, we stop, so remaining messages are summarized


we need to add some special handling for TOOL and ASSISTANT messages -- see the get() method for the ChatMemoryBuffer

we can't start the chat history with an assistant message (this breaks several LLM APIs)

we need to maintain the TOOL + ASSISTANT message pairs, so as not not break openai, mistral, etc.

I implemented the changes you suggested. I have a few remarks. It may not make sense to summarize tool/function messages, so it would probably be helpful to think about what should happen with such messages in general (completely removed, include as full text, summarize, make it configurable for the user?). Perhaps this is sufficient for now and we can expand on it later.

Also, when a summary is generated (LLM is not none and the messages do not fit in the buffer), the first message will always be a SYSTEM message, followed by potentially an ASSISTANT message. Does that also fail those API’s you mentioned? If so, we need this handling when a summary is made, which the current implementation does. However, if SYSTEM followed by ASSISTANT does not break the API’s, the special handling could be limited to the case when a summary is not generated (when LLM is None or the buffer is not full).

shadyelgewily-slimstock · 2024-04-30T10:25:40Z

@logan-markewich I don't really understand why my tests are giving the error:

self = MockSummarizerLLM(), name = '_i', value = 0

@no_type_check
def __setattr__(self, name, value):  # noqa: C901 (ignore complexity)
    if name in self.__private_attributes__ or name in DUNDER_ATTRIBUTES:
        return object_setattr(self, name, value)

    if self.__config__.extra is not Extra.allow and name not in self.__fields__:

      raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')

E ValueError: "MockSummarizerLLM" object has no field "_i"

The classes MockChatLLM and MockStreamChatLLM seem to be using the exact same mechanism, and my tests are passing locally. Any ideas about what is going wrong (I'm not so versed in Pydantic)?

logan-markewich · 2024-04-30T18:45:36Z

@shadyelgewily-slimstock running pytest tests locally, I get the same error as CICD. Will take a peek at what's going on here

logan-markewich · 2024-04-30T18:59:16Z

@shadyelgewily-slimstock fixed, was just an issue with not using the compatibility layer on pydantic

shadyelgewily-slimstock · 2024-05-01T07:16:57Z

Thanks for your help @logan-markewich, let me know if there's anything I can do to get this merged. I haven't bumped the core version (maybe I should).

shadyelgewily-slimstock added 2 commits April 29, 2024 11:56

Introduce ChatSummaryMemoryBuffer

0991802

Update docstrings

847dbb0

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 29, 2024

Fix test comment

7e11ea2

logan-markewich reviewed Apr 29, 2024

View reviewed changes

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py Outdated Show resolved Hide resolved

logan-markewich reviewed Apr 29, 2024

View reviewed changes

llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py Outdated Show resolved Hide resolved

logan-markewich reviewed Apr 29, 2024

View reviewed changes

logan-markewich and others added 4 commits April 29, 2024 10:05

linting

2b88892

Special handling for assistant and tool messages

4af5467

Add implementations of from_dict and from_str

df6f274

Fix for adding int to Pydantic PrivateAttr directly

e3e6f2f

logan-markewich added 2 commits April 30, 2024 12:49

fix tests

1d1484d

update typing

323bd25

logan-markewich approved these changes May 2, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 2, 2024

logan-markewich merged commit d4336e6 into run-llama:main May 2, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chat summary memory buffer #13155

Add chat summary memory buffer #13155

shadyelgewily-slimstock commented Apr 29, 2024 •

edited

review-notebook-app bot commented Apr 29, 2024

shadyelgewily-slimstock commented Apr 29, 2024

Josephrp commented Apr 29, 2024

logan-markewich Apr 29, 2024

shadyelgewily-slimstock Apr 30, 2024 •

edited

logan-markewich Apr 29, 2024

shadyelgewily-slimstock Apr 30, 2024

logan-markewich Apr 29, 2024

shadyelgewily-slimstock Apr 30, 2024

shadyelgewily-slimstock commented Apr 30, 2024

logan-markewich commented Apr 30, 2024

logan-markewich commented Apr 30, 2024

shadyelgewily-slimstock commented May 1, 2024

		SUMMARIZE_PROMPT = "The following is a conversation between the user and assistant. Write a concise summary about the contents of this conversation."


		# TODO: Add option for last N user/assistant history interactions instead of token limit

Add chat summary memory buffer #13155

Add chat summary memory buffer #13155

Conversation

shadyelgewily-slimstock commented Apr 29, 2024 • edited

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

review-notebook-app bot commented Apr 29, 2024

shadyelgewily-slimstock commented Apr 29, 2024

Josephrp commented Apr 29, 2024

logan-markewich Apr 29, 2024

Choose a reason for hiding this comment

shadyelgewily-slimstock Apr 30, 2024 • edited

Choose a reason for hiding this comment

logan-markewich Apr 29, 2024

Choose a reason for hiding this comment

shadyelgewily-slimstock Apr 30, 2024

Choose a reason for hiding this comment

logan-markewich Apr 29, 2024

Choose a reason for hiding this comment

shadyelgewily-slimstock Apr 30, 2024

Choose a reason for hiding this comment

shadyelgewily-slimstock commented Apr 30, 2024

logan-markewich commented Apr 30, 2024

logan-markewich commented Apr 30, 2024

shadyelgewily-slimstock commented May 1, 2024

shadyelgewily-slimstock commented Apr 29, 2024 •

edited

shadyelgewily-slimstock Apr 30, 2024 •

edited