Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add token usage to Bedrock Claude + Migrated chain for this model #564

Merged
merged 3 commits into from
Sep 16, 2024

Conversation

charles-marion
Copy link
Collaborator

Issue #, if available:
#502 #495 #230

Description of changes:

To add usage tracking to bedrock models, I migrated the langchain chain for the claude model (ConversationChain is deprecated)

Instead it uses RunnableWithMessageHistory with ChatBedrockConverse that relied on the Bedrock Converse API that is consistent across models and provide the usage in the response.

Changes

  • Migrate chain for claude Bedrock
  • Add Usage to metadata
  • Add Usage JSON log
  • Add Cloudwatch filter parsing the log and generating a metrics (With a CLI config due to added cosst)
  • Add Metric to dashboard.
  • Added npm run vet-all to quickly verify formatting and tests

Testing

  • Verified with and without RAG and Streaming
  • Ran integ tests

Future Change

  • Migrate all models since ConversionChain is deprecated
  • Add usage tracking to Sagemaker endpoint (if possible)

Note: This change is modifying the prompts to match the new Langchain patterns. For example:
Before

The following is a friendly conversation between a human and an AI. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: test
AI: I'm afraid I don't have enough context to answer that question. Could you please provide more details?
Human: test
AI: I apologize,...

After

System: The following is a friendly conversation between a human and an AI.If the AI does not know the answer to a question, it truthfully says it does not know.
Human: test
AI: I'm afraid I don't have enough context to answer your question. Could you please provide more details?
Human: test
AI: I don't have enough information to answer your question. The context provided mentions an Integ Test flower that is yellow, but does not include a direct question.

Image with metdata usage
image
Dashboard
image

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@gbone-restore
Copy link

With the older ConversationRetrievalChain, I limited how much history I would pass into a model. In my organization, we are seeing chat history messages grow across a variety of topics and it can cause inaccurate rephrasing of questions.

I subclassed ConversationBufferMemory to give a rolling window of conversation history that is a smaller subset of the entire history.

eg:

from langchain.memory import ConversationBufferMemory
from typing import Dict, List, Any
from pydantic import Field

class WindowedConversationBufferMemory(ConversationBufferMemory):
    k: int = Field(default=2, description="Number of recent conversations to keep")

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        # Save the full context to the underlying storage (DynamoDB)
        super().save_context(inputs, outputs)

    def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        # Load the full history from the underlying storage
        result = super().load_memory_variables(inputs)

        # If there's no history, return an empty list or dict
        if self.memory_key not in result or not result[self.memory_key]:
            return {self.memory_key: [] if self.return_messages else ""}

        # Windowing: Only return the last k conversations
        if self.return_messages:
            result[self.memory_key] = result[self.memory_key][-2*self.k:]
        else:
            conversations = result[self.memory_key].split('\n\nHuman: ')
            recent_conversations = conversations[-min(self.k, len(conversations)):]
            result[self.memory_key] = '\n\nHuman: '.join(recent_conversations).strip()

        return result

I want to do something similar with RunnableWithMessageHistory but I'm still getting up to speed on this new API. Do you think that limiting the message history to a smaller slice of data is an important feature?

@charles-marion
Copy link
Collaborator Author

With the older ConversationRetrievalChain, I limited how much history I would pass into a model. In my organization, we are seeing chat history messages grow across a variety of topics and it can cause inaccurate rephrasing of questions.

I subclassed ConversationBufferMemory to give a rolling window of conversation history that is a smaller subset of the entire history.

eg:

from langchain.memory import ConversationBufferMemory
from typing import Dict, List, Any
from pydantic import Field

class WindowedConversationBufferMemory(ConversationBufferMemory):
    k: int = Field(default=2, description="Number of recent conversations to keep")

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        # Save the full context to the underlying storage (DynamoDB)
        super().save_context(inputs, outputs)

    def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        # Load the full history from the underlying storage
        result = super().load_memory_variables(inputs)

        # If there's no history, return an empty list or dict
        if self.memory_key not in result or not result[self.memory_key]:
            return {self.memory_key: [] if self.return_messages else ""}

        # Windowing: Only return the last k conversations
        if self.return_messages:
            result[self.memory_key] = result[self.memory_key][-2*self.k:]
        else:
            conversations = result[self.memory_key].split('\n\nHuman: ')
            recent_conversations = conversations[-min(self.k, len(conversations)):]
            result[self.memory_key] = '\n\nHuman: '.join(recent_conversations).strip()

        return result

I want to do something similar with RunnableWithMessageHistory but I'm still getting up to speed on this new API. Do you think that limiting the message history to a smaller slice of data is an important feature?

The memory used by RunnableWithMessageHistory in this change
is this class

To implement it I would just add a max messages returned parameter:

(because you still want to store/return the full history to view the session)

This would make this independent of the chain.

Do you think that limiting the message history to a smaller slice of data is an important feature?
I do agree since it would reduce the number of token used but it would need to be configurable somewhere.

@charles-marion charles-marion merged commit 6242c59 into aws-samples:main Sep 16, 2024
1 check passed
@charles-marion charles-marion deleted the usage branch September 16, 2024 14:22
lloydclowes pushed a commit to lloydclowes/gen-ai-playground that referenced this pull request Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants