You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have some notion of memory, and the ability to attach memory to an agent.
Right now the AssitantAgent can take on tools.
agent=Agent(model=model, tools=[] )
Some use cases often benefit from being able to retrieve memory just in time, add to the prompt before responding (RAG etc).
agent=Agent(model=model, tools=[], memory=[])
Memory Behaviour
A default behaviour might be that
on_messages(task) .. task is used as a query to memory.query() if memory is provided to the agent
the response is added to the prompt before the LLM responds
A rough sketch of a memory protocol.
@runtime_checkableclassMemory(Protocol):
"""Protocol defining the interface for memory implementations."""@propertydefname(self) ->str|None:
"""The name of this memory implementation."""
...
@propertydefconfig(self) ->BaseMemoryConfig:
"""The configuration for this memory implementation."""
...
asyncdefquery(
self,
query: Union[str, Image, List[Union[str, Image]]],
cancellation_token: CancellationToken|None=None,
**kwargs: Any
) ->List[MemoryQueryResult]:
""" Query the memory store and return relevant entries. Args: query: Text, image or multimodal query cancellation_token: Optional token to cancel operation **kwargs: Additional implementation-specific parameters Returns: List of memory entries with relevance scores """
...
asyncdefadd(
self,
entry: MemoryEntry,
cancellation_token: CancellationToken|None=None
) ->None:
""" Add a new entry to memory. Args: entry: The memory entry to add cancellation_token: Optional token to cancel operation """
...
asyncdefclear(self) ->None:
"""Clear all entries from memory."""
...
asyncdefcleanup(self) ->None:
"""Clean up any resources used by the memory implementation."""
...
AssistantAgent will try to query memory using last message in on_messages (if TextMessage, or MultiModalMessage), returned result is appended to content
Developers can implement their own custom memory classes by implementing the Memory protocol.
The implementation AssistantAgent impl above focuses on memory.query and adds that JIT to the context. It does not concern itself much with how stuff is added to memory - reason being that his can be heavily usecase driven and expects the developer to add to memory outside of agent logic.
fromautogen_agentchat.memory._base_memoryimportMemoryEntryfromautogen_agentchat.memory._chroma_memoryimportChromaMemory, ChromaMemoryConfig# Initialize memorychroma_memory=ChromaMemory(
name="travel_memory",
config=ChromaMemoryConfig(
collection_name="travel_facts",
# Configure number of results to return instead of similarity thresholdk=1
)
)
# Add some travel-related memoriesawaitchroma_memory.add(MemoryEntry(
content="Paris is known for the Eiffel Tower and amazing cuisine.",
source="travel_guide"
))
awaitchroma_memory.add(MemoryEntry(
content="The most important thing about tokyo is that it has the world's busiest railway station - Shinjuku Station.",
source="travel_facts"
))
# Create agent with memoryagent=AssistantAgent(
name="travel_agent",
model_client=OpenAIChatCompletionClient(
model="gpt-4o",
# api_key="your_api_key"
),
memory=chroma_memory,
system_message="You are a travel expert"
)
agent_team=RoundRobinGroupChat([agent], termination_condition=MaxMessageTermination(max_messages=2))
stream=agent_team.run_stream(task="Tell me the most important thing about Tokyo.")
awaitConsole(stream);
---------- user ----------
Tell me the most important thing about Tokyo.
---------- travel_agent ----------
One of the most important aspects of Tokyo is that it has the world's busiest railway station, Shinjuku Station. This station serves as a major hub for transportation, with millions of commuters and travelers passing through its complex network of train lines each day. It highlights Tokyo's status as a bustling metropolis with an advanced public transportation system.
[Prompt tokens: 72, Completion tokens: 66]
---------- Summary ----------
Number of messages: 2
Finish reason: Maximum number of messages 2 reached, current message count: 2
Total prompt tokens: 72
Total completion tokens: 66
Duration: 1.47 seconds
One way to think about it is that loading state has to do with what happens before/after an agent is run. Memory is more dynamic and is more about injecting JIT context given the exact input the agent receives during run.
Why is this needed?
Provides an interface or supporting RAG/Memory .
The text was updated successfully, but these errors were encountered:
@colombod , @rickyloynd-microsoft , @ekzhu .
I have updated the issue above with some sample implementation on (below).
I'd love to get some feedback on if there is appetite for this in AgentChat, or general feedback before any additional progress is made.
Draft PR here - #4438
What feature would you like to be added?
Memory for AgentChat Agents
It would be useful to have some notion of memory, and the ability to attach memory to an agent.
Right now the
AssitantAgent
can take on tools.Some use cases often benefit from being able to retrieve memory just in time, add to the prompt before responding (RAG etc).
Memory Behaviour
A default behaviour might be that
on_messages(task)
.. task is used as a query to memory.query() if memory is provided to the agentA rough sketch of a memory protocol.
on_messages
(if TextMessage, or MultiModalMessage), returned result is appended to contentMemory
protocol.AssistantAgent
impl above focuses onmemory.query
and adds that JIT to the context. It does not concern itself much with how stuff is added to memory - reason being that his can be heavily usecase driven and expects the developer to add to memory outside of agent logic.Example Implementation
I have a branch that implements
Example notebook highlighting these.
---------- user ---------- Tell me the most important thing about Tokyo. ---------- travel_agent ---------- One of the most important aspects of Tokyo is that it has the world's busiest railway station, Shinjuku Station. This station serves as a major hub for transportation, with millions of commuters and travelers passing through its complex network of train lines each day. It highlights Tokyo's status as a bustling metropolis with an advanced public transportation system. [Prompt tokens: 72, Completion tokens: 66] ---------- Summary ---------- Number of messages: 2 Finish reason: Maximum number of messages 2 reached, current message count: 2 Total prompt tokens: 72 Total completion tokens: 66 Duration: 1.47 seconds
@ekzhu
Potentially relevant to support might be langchain memory. There a basic memory is a list of messages.
https://python.langchain.com/v0.1/docs/modules/memory/
How is this related to load/save state
One way to think about it is that loading state has to do with what happens before/after an agent is run. Memory is more dynamic and is more about injecting JIT context given the exact input the agent receives during run.
Why is this needed?
Provides an interface or supporting RAG/Memory .
The text was updated successfully, but these errors were encountered: