Persistence in the long run #421

danx0r · 2023-11-10T21:57:55Z

danx0r
Nov 10, 2023

Presently, the persistent information regarding agents and chat/thinking history is kept in a set of files in $USER/.memgpt. This is not a solution that will scale very well, especially the chat history, which could easily grow to gigabytes in production. This data is in the form of json and pickled Python objects that are json-like (dicts & lists). This looks to me like a very good candidate for a document storage approach.

The tool I'm most familiar with for this is MongoDB, but I've also done a lot of work on Postgres, and recent developments at MongoDB HQ are not Open Source friendly. The Postgres jsonb column format appears to have most of the features of MongoDB such as unstructured indexes. I suggest that it makes sense to use Postgres jsonb as an archive format for all chat and agent identity information presently kept in the config files (we can leave a local option available for now).

I've already started playing with this in my fork; I just wanted to start a discussion so there's no duplicated effort in case this is already being worked on or it's part of the roadmap master plan.

Thoughts?

danx0r · 2023-11-10T22:00:35Z

danx0r
Nov 10, 2023
Author

I do see some code that may relate here:

memory.py line 182:

class DummyArchivalMemory(ArchivalMemory):
    """Dummy in-memory version of an archival memory database (eg run on MongoDB)

    Archival Memory: A more structured and deep storage space for the AI's reflections,
    insights, or any other data that doesn't fit into the active memory but
    is essential enough not to be left only to the recall memory.
    """

lass DummyRecallMemory(RecallMemory):
    """Dummy in-memory version of a recall memory database (eg run on MongoDB)

    Recall memory here is basically just a full conversation history with the user.
    Queryable via string matching, or date matching.

    Recall Memory: The AI's capability to search through past interactions,
    effectively allowing it to 'remember' prior engagements with a user.
    """

0 replies

cpacker · 2023-11-12T03:41:42Z

cpacker
Nov 12, 2023
Maintainer

(cross-posting from discord for posterity)

to restate you question (just to be sure i'm interpreting it correctly, please correct me if i misread you) - even though memgpt archival memory can be set up to use real databases, the basic memgpt state (current messages buffer, etc) is stored in a pretty basic fashion (pickle/json), how can we extend the code such that memgpt's basic state is backed by a real database? so that if you're serving 1000s of memgpt bots, you're not relying on reading and writing state to json files, but instead you natively back memgpt state with eg mongodb?

one answer is you implement a persistencemanager class that writes to a database - this is what we do with the discord bot code. basically, just override the functions here to sync a mongodb database with the agent object state: https://github.com/cpacker/MemGPT/blob/main/memgpt/persistence_manager.py#L16-L35

0 replies

cpacker · 2023-11-12T04:07:37Z

cpacker
Nov 12, 2023
Maintainer

Some basic function signatures that may help:

class DatabaseArchivalMemory(ArchivalMemory):

    def __init__(self, db_handler, agent_id, ...):

    def insert(self, memory_string):
        """insert into database"""

    def search(self, query_string, count=None, start=None):
        results, total = self.db_handler.search_archival_memory(...)

        # agent call expects results to have 'timestamp' and 'content'
        # - add 'timestamp' via '_id' field
        # - rename 'memory' to 'content'
        parsed_results = ...
        return parsed_results, total 

    def __repr__(self) -> str:
        return "DatabaseArchivalMemoryObject"


class DatabaseRecallMemory(RecallMemory):

    def __init__(self, db_handler, agent_id, ...):

   def text_search(self, query_string, count=None, start=None):
        results, total = self.db_handler.search_messages_text_query_paged(...)
        ...
        return parsed_results, total 

   def date_search(self, start_date, end_date, count=None, start=None):
        results, total = self.db_handler.search_messages_by_date_paged(...)
        ...
        return parsed_results, total 

    def __repr__(self) -> str:
        return "DatabaseRecallMemoryObject"


class DatabasePersistenceManager(PersistenceManager):
    """Sync Agent with database on every action/event"""

    def __init__(self, db_handler, ...):

    def init(self, agent):

    def trim_messages(self, num):

    def prepend_to_messages(self, added_messages):

    def append_to_messages(self, added_messages):

    def swap_system_message(self, new_system_message):

    def update_memory(self, new_memory):

0 replies

danx0r · 2023-11-29T21:17:58Z

danx0r
Nov 29, 2023
Author

For posterity:

swooders
—
Today at 12:22 PM
@danx0r sorry for the delay! I took a look and I think it would be neater if we left the postgres code into the storage connectors (basically extending the current storage functions as specified here https://vagabond-snowdrop-79b.notion.site/MemGPT-Data-Structure-a14c71394a524bc19201e34889d31639) and basically modifying the memory classes to write to the storage connectors instead
so basically replacing DummyRecallMemory in persistence manager https://github.com/cpacker/MemGPT/blob/main/memgpt/persistence_manager.py#L108 (in the same way as EmbeddingArchivalMemory)
And created some new RecallMemory class (I think its fine to just call it that) that interacts with the storage connector (similar to EmbeddingArchivalMemory

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistence in the long run #421

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Persistence in the long run #421

danx0r Nov 10, 2023

Replies: 4 comments

danx0r Nov 10, 2023 Author

cpacker Nov 12, 2023 Maintainer

cpacker Nov 12, 2023 Maintainer

danx0r Nov 29, 2023 Author

danx0r
Nov 10, 2023

danx0r
Nov 10, 2023
Author

cpacker
Nov 12, 2023
Maintainer

cpacker
Nov 12, 2023
Maintainer

danx0r
Nov 29, 2023
Author