Persistence in the long run #421
Replies: 4 comments
-
I do see some code that may relate here: memory.py line 182:
|
Beta Was this translation helpful? Give feedback.
-
(cross-posting from discord for posterity) to restate you question (just to be sure i'm interpreting it correctly, please correct me if i misread you) - even though memgpt archival memory can be set up to use real databases, the basic memgpt state (current messages buffer, etc) is stored in a pretty basic fashion (pickle/json), how can we extend the code such that memgpt's basic state is backed by a real database? so that if you're serving 1000s of memgpt bots, you're not relying on reading and writing state to json files, but instead you natively back memgpt state with eg mongodb? one answer is you implement a persistencemanager class that writes to a database - this is what we do with the discord bot code. basically, just override the functions here to sync a mongodb database with the agent object state: https://github.com/cpacker/MemGPT/blob/main/memgpt/persistence_manager.py#L16-L35 |
Beta Was this translation helpful? Give feedback.
-
Some basic function signatures that may help: class DatabaseArchivalMemory(ArchivalMemory):
def __init__(self, db_handler, agent_id, ...):
def insert(self, memory_string):
"""insert into database"""
def search(self, query_string, count=None, start=None):
results, total = self.db_handler.search_archival_memory(...)
# agent call expects results to have 'timestamp' and 'content'
# - add 'timestamp' via '_id' field
# - rename 'memory' to 'content'
parsed_results = ...
return parsed_results, total
def __repr__(self) -> str:
return "DatabaseArchivalMemoryObject"
class DatabaseRecallMemory(RecallMemory):
def __init__(self, db_handler, agent_id, ...):
def text_search(self, query_string, count=None, start=None):
results, total = self.db_handler.search_messages_text_query_paged(...)
...
return parsed_results, total
def date_search(self, start_date, end_date, count=None, start=None):
results, total = self.db_handler.search_messages_by_date_paged(...)
...
return parsed_results, total
def __repr__(self) -> str:
return "DatabaseRecallMemoryObject"
class DatabasePersistenceManager(PersistenceManager):
"""Sync Agent with database on every action/event"""
def __init__(self, db_handler, ...):
def init(self, agent):
def trim_messages(self, num):
def prepend_to_messages(self, added_messages):
def append_to_messages(self, added_messages):
def swap_system_message(self, new_system_message):
def update_memory(self, new_memory): |
Beta Was this translation helpful? Give feedback.
-
For posterity: swooders |
Beta Was this translation helpful? Give feedback.
-
Presently, the persistent information regarding agents and chat/thinking history is kept in a set of files in $USER/.memgpt. This is not a solution that will scale very well, especially the chat history, which could easily grow to gigabytes in production. This data is in the form of json and pickled Python objects that are json-like (dicts & lists). This looks to me like a very good candidate for a document storage approach.
The tool I'm most familiar with for this is MongoDB, but I've also done a lot of work on Postgres, and recent developments at MongoDB HQ are not Open Source friendly. The Postgres jsonb column format appears to have most of the features of MongoDB such as unstructured indexes. I suggest that it makes sense to use Postgres jsonb as an archive format for all chat and agent identity information presently kept in the config files (we can leave a local option available for now).
I've already started playing with this in my fork; I just wanted to start a discussion so there's no duplicated effort in case this is already being worked on or it's part of the roadmap master plan.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions