Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat summary memory buffer #13155

Conversation

shadyelgewily-slimstock
Copy link
Contributor

@shadyelgewily-slimstock shadyelgewily-slimstock commented Apr 29, 2024

Description

This PR introduces a new ChatSummaryMemoryBuffer to limit the chat history to a certain token length, and iteratively summarize all messages that do not fit in the memory buffer. This can be useful if you want to limit costs and latency (assuming the summarization prompt uses and generates fewer tokens than including the entire history). The original ChatMemoryBuffer gives you the option to truncate the history after a certain number of tokens, which is useful to limit costs and latency, but also removes potentially relevant information from the chat history. The new ChatSummaryMemoryBuffer aims to makes this a bit more flexible, so the user has more control over which chat_history is retained.

Fixes #12845

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • [X ] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 29, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@shadyelgewily-slimstock
Copy link
Contributor Author

Limitations:

  • from_string() and from_dict() are not implemented, because we need the summarizer_llm
  • For the same reason, I have not tested pickling

Future improvements:

  • It probably makes sense to consider the last N interactions (i.e., user/assistant pairs), instead of a token limit. This can be added as an option.
  • Default summarization prompt may be improved
  • Perhaps we should add a list of ChatMessages[] to the LLM instead of constructing a single string, I'm not sure whether this makes any difference.

@Josephrp
Copy link

this is great, arguably chat memory classes are lacking in llamaindex and implementing various chat memory approaches will add value and flexibility to users, also for testing their application.

SUMMARIZE_PROMPT = "The following is a conversation between the user and assistant. Write a concise summary about the contents of this conversation."


# TODO: Add option for last N user/assistant history interactions instead of token limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, slightly off topic but: does N messages make sense? What if a single user or assistant message is huge? Also another consideration, for function calls, openai (and others) require pairs of an assistant and tool message to be in memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a single user or assistant message is huge, then indeed, this is not such a great idea. On the other hand, the user has knowledge about the prompts and the expected length of the responses (perhaps something along the lines of "Answer in a single sentence: what is LlamaIndex?") and it could be a use case to consider only the last 3 or 5 "interactions", say.

The reasons I considered it is to ensure that USER/ASSISTANT messages are always considered in pairs, and to give some more flexibility to the end user. But it's more of a nice to have for the future if there is appetite for that flexibility.

def from_dict(
cls, data: Dict[str, Any], **kwargs: Any
) -> "ChatSummaryMemoryBuffer":
raise NotImplementedError("This is not yet supported.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_dict() / from_string() could be implemented by just passing in the LLM when calling it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understood you correctly, but I added a basic implementation. If this is not what you are looking for, could you make the code change?

<= self.token_limit_full_text
):
# traverse the history in reverse order, when token limit is about to be
# exceeded, we stop, so remaining messages are summarized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add some special handling for TOOL and ASSISTANT messages -- see the get() method for the ChatMemoryBuffer

  • we can't start the chat history with an assistant message (this breaks several LLM APIs)
  • we need to maintain the TOOL + ASSISTANT message pairs, so as not not break openai, mistral, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the changes you suggested. I have a few remarks. It may not make sense to summarize tool/function messages, so it would probably be helpful to think about what should happen with such messages in general (completely removed, include as full text, summarize, make it configurable for the user?). Perhaps this is sufficient for now and we can expand on it later.

Also, when a summary is generated (LLM is not none and the messages do not fit in the buffer), the first message will always be a SYSTEM message, followed by potentially an ASSISTANT message. Does that also fail those API’s you mentioned? If so, we need this handling when a summary is made, which the current implementation does. However, if SYSTEM followed by ASSISTANT does not break the API’s, the special handling could be limited to the case when a summary is not generated (when LLM is None or the buffer is not full).

@shadyelgewily-slimstock
Copy link
Contributor Author

@logan-markewich I don't really understand why my tests are giving the error:

self = MockSummarizerLLM(), name = '_i', value = 0

@no_type_check
def __setattr__(self, name, value):  # noqa: C901 (ignore complexity)
    if name in self.__private_attributes__ or name in DUNDER_ATTRIBUTES:
        return object_setattr(self, name, value)

    if self.__config__.extra is not Extra.allow and name not in self.__fields__:
      raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')

E ValueError: "MockSummarizerLLM" object has no field "_i"

The classes MockChatLLM and MockStreamChatLLM seem to be using the exact same mechanism, and my tests are passing locally. Any ideas about what is going wrong (I'm not so versed in Pydantic)?

@logan-markewich
Copy link
Collaborator

@shadyelgewily-slimstock running pytest tests locally, I get the same error as CICD. Will take a peek at what's going on here

@logan-markewich
Copy link
Collaborator

@shadyelgewily-slimstock fixed, was just an issue with not using the compatibility layer on pydantic

@shadyelgewily-slimstock
Copy link
Contributor Author

Thanks for your help @logan-markewich, let me know if there's anything I can do to get this merged. I haven't bumped the core version (maybe I should).

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 2, 2024
@logan-markewich logan-markewich merged commit d4336e6 into run-llama:main May 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question]: Is it possible integrate langchain ConversationBufferMemory ?
3 participants