Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

microsoft · Oct 29, 2024 · 0b952d4 · 0b952d4
2 parents fc56024 + 87bd1de
commit 0b952d4
Show file tree

Hide file tree

Showing 4 changed files with 340 additions and 42 deletions.
diff --git a/...n/packages/autogen-core/docs/src/user-guide/core-user-guide/framework/model-clients.ipynb b/...n/packages/autogen-core/docs/src/user-guide/core-user-guide/framework/model-clients.ipynb
@@ -74,6 +74,24 @@
     "print(response.content)"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "RequestUsage(prompt_tokens=15, completion_tokens=7)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Print the response token usage\n",
+    "print(response.usage)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -86,32 +104,34 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Streamed responses:\n",
-      "In a secluded valley where the sun painted the sky with hues of gold, a solitary dragon named Bremora stood guard. Her emerald scales shimmered with an ancient light as she watched over the village below. Unlike her fiery kin, Bremora had no desire for destruction; her soul was bound by a promise to protect.\n",
-      "\n",
-      "Generations ago, a wise elder had befriended Bremora, offering her companionship instead of fear. In gratitude, she vowed to shield the village from calamity. Years passed, and children grew up believing in the legends of a watchful dragon who brought them prosperity and peace.\n",
+      "In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
       "\n",
-      "One summer, an ominous storm threatened the valley, with ravenous winds and torrents of rain. Bremora rose into the tempest, her mighty wings defying the chaos. She channeled her breath—not of fire, but of warmth and tranquility—calming the storm and saving her cherished valley.\n",
+      "One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
       "\n",
-      "When dawn broke and the village emerged unscathed, the people looked to the sky. There, Bremora soared gracefully, a guardian spirit woven into their lives, silently promising her eternal vigilance.\n",
+      "From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
       "\n",
       "------------\n",
       "\n",
       "The complete response:\n",
-      "In a secluded valley where the sun painted the sky with hues of gold, a solitary dragon named Bremora stood guard. Her emerald scales shimmered with an ancient light as she watched over the village below. Unlike her fiery kin, Bremora had no desire for destruction; her soul was bound by a promise to protect.\n",
+      "In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
+      "\n",
+      "One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
+      "\n",
+      "From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
       "\n",
-      "Generations ago, a wise elder had befriended Bremora, offering her companionship instead of fear. In gratitude, she vowed to shield the village from calamity. Years passed, and children grew up believing in the legends of a watchful dragon who brought them prosperity and peace.\n",
       "\n",
-      "One summer, an ominous storm threatened the valley, with ravenous winds and torrents of rain. Bremora rose into the tempest, her mighty wings defying the chaos. She channeled her breath—not of fire, but of warmth and tranquility—calming the storm and saving her cherished valley.\n",
+      "------------\n",
       "\n",
-      "When dawn broke and the village emerged unscathed, the people looked to the sky. There, Bremora soared gracefully, a guardian spirit woven into their lives, silently promising her eternal vigilance.\n"
+      "The token usage was:\n",
+      "RequestUsage(prompt_tokens=0, completion_tokens=0)\n"
      ]
     }
    ],
@@ -133,7 +153,10 @@
     "        # The last response is a CreateResult object with the complete message.\n",
     "        print(\"\\n\\n------------\\n\")\n",
     "        print(\"The complete response:\", flush=True)\n",
-    "        print(response.content, flush=True)"
+    "        print(response.content, flush=True)\n",
+    "        print(\"\\n\\n------------\\n\")\n",
+    "        print(\"The token usage was:\", flush=True)\n",
+    "        print(response.usage, flush=True)"
    ]
   },
   {
@@ -143,7 +166,86 @@
     "```{note}\n",
     "The last response in the streaming response is always the final response\n",
     "of the type {py:class}`~autogen_core.components.models.CreateResult`.\n",
-    "```"
+    "```\n",
+    "\n",
+    "**NB the default usage response is to return zero values**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### A Note on Token usage counts with streaming example\n",
+    "Comparing usage returns in  the above Non Streaming `model_client.create(messages=messages)` vs streaming `model_client.create_stream(messages=messages)` we see differences.\n",
+    "The non streaming response by default returns valid prompt and completion token usage counts. \n",
+    "The streamed response by default returns zero values.\n",
+    "\n",
+    "as documented in the OPENAI API Reference an additional parameter `stream_options` can be specified to return valid usage counts. see [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options)\n",
+    "\n",
+    "Only set this when you using streaming ie , using `create_stream` \n",
+    "\n",
+    "to enable this in `create_stream` set `extra_create_args={\"stream_options\": {\"include_usage\": True}},`\n",
+    "\n",
+    "- **Note whilst other API's like LiteLLM also support this, it is not always guarenteed that it is fully supported or correct**\n",
+    "\n",
+    "#### Streaming example with token usage\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Streamed responses:\n",
+      "In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
+      "\n",
+      "Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
+      "\n",
+      "As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
+      "\n",
+      "------------\n",
+      "\n",
+      "The complete response:\n",
+      "In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
+      "\n",
+      "Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
+      "\n",
+      "As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
+      "\n",
+      "\n",
+      "------------\n",
+      "\n",
+      "The token usage was:\n",
+      "RequestUsage(prompt_tokens=17, completion_tokens=146)\n"
+     ]
+    }
+   ],
+   "source": [
+    "messages = [\n",
+    "    UserMessage(content=\"Write a very short story about a dragon.\", source=\"user\"),\n",
+    "]\n",
+    "\n",
+    "# Create a stream.\n",
+    "stream = model_client.create_stream(messages=messages, extra_create_args={\"stream_options\": {\"include_usage\": True}})\n",
+    "\n",
+    "# Iterate over the stream and print the responses.\n",
+    "print(\"Streamed responses:\")\n",
+    "async for response in stream:  # type: ignore\n",
+    "    if isinstance(response, str):\n",
+    "        # A partial response is a string.\n",
+    "        print(response, flush=True, end=\"\")\n",
+    "    else:\n",
+    "        # The last response is a CreateResult object with the complete message.\n",
+    "        print(\"\\n\\n------------\\n\")\n",
+    "        print(\"The complete response:\", flush=True)\n",
+    "        print(response.content, flush=True)\n",
+    "        print(\"\\n\\n------------\\n\")\n",
+    "        print(\"The token usage was:\", flush=True)\n",
+    "        print(response.usage, flush=True)"
    ]
   },
   {
@@ -234,7 +336,8 @@
     "from autogen_core.application import SingleThreadedAgentRuntime\n",
     "from autogen_core.base import MessageContext\n",
     "from autogen_core.components import RoutedAgent, message_handler\n",
-    "from autogen_core.components.models import ChatCompletionClient, OpenAIChatCompletionClient, SystemMessage, UserMessage\n",
+    "from autogen_core.components.models import ChatCompletionClient, SystemMessage, UserMessage\n",
+    "from autogen_ext.models import OpenAIChatCompletionClient\n",
     "\n",
     "\n",
     "@dataclass\n",

diff --git a/python/packages/autogen-core/src/autogen_core/components/models/_openai_client.py b/python/packages/autogen-core/src/autogen_core/components/models/_openai_client.py
@@ -39,6 +39,7 @@
     completion_create_params,
 )
 from openai.types.chat.chat_completion import Choice
+from openai.types.chat.chat_completion_chunk import Choice as ChunkChoice
 from openai.types.shared_params import FunctionDefinition, FunctionParameters
 from pydantic import BaseModel
 from typing_extensions import Unpack
@@ -555,6 +556,31 @@ async def create_stream(
         extra_create_args: Mapping[str, Any] = {},
         cancellation_token: Optional[CancellationToken] = None,
     ) -> AsyncGenerator[Union[str, CreateResult], None]:
+        """
+        Creates an AsyncGenerator that will yield a stream of chat completions based on the provided messages and tools.
+
+        Args:
+            messages (Sequence[LLMMessage]): A sequence of messages to be processed.
+            tools (Sequence[Tool | ToolSchema], optional): A sequence of tools to be used in the completion. Defaults to `[]`.
+            json_output (Optional[bool], optional): If True, the output will be in JSON format. Defaults to None.
+            extra_create_args (Mapping[str, Any], optional): Additional arguments for the creation process. Default to `{}`.
+            cancellation_token (Optional[CancellationToken], optional): A token to cancel the operation. Defaults to None.
+
+        Yields:
+            AsyncGenerator[Union[str, CreateResult], None]: A generator yielding the completion results as they are produced.
+
+        In streaming, the default behaviour is not return token usage counts. See: [OpenAI API reference for possible args](https://platform.openai.com/docs/api-reference/chat/create).
+        However `extra_create_args={"stream_options": {"include_usage": True}}` will (if supported by the accessed API)
+        return a final chunk with usage set to a RequestUsage object having prompt and completion token counts,
+        all preceding chunks will have usage as None. See: [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options).
+
+        Other examples of OPENAI supported arguments that can be included in `extra_create_args`:
+            - `temperature` (float): Controls the randomness of the output. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic.
+            - `max_tokens` (int): The maximum number of tokens to generate in the completion.
+            - `top_p` (float): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
+            - `frequency_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeated phrases.
+            - `presence_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on whether they appear in the text so far, encouraging the model to talk about new topics.
+        """
         # Make sure all extra_create_args are valid
         extra_create_args_keys = set(extra_create_args.keys())
         if not create_kwargs.issuperset(extra_create_args_keys):
@@ -601,7 +627,8 @@ async def create_stream(
         if cancellation_token is not None:
             cancellation_token.link_future(stream_future)
         stream = await stream_future
-
+        choice: Union[ParsedChoice[Any], ParsedChoice[BaseModel], ChunkChoice] = cast(ChunkChoice, None)
+        chunk = None
         stop_reason = None
         maybe_model = None
         content_deltas: List[str] = []
@@ -614,8 +641,23 @@ async def create_stream(
                 if cancellation_token is not None:
                     cancellation_token.link_future(chunk_future)
                 chunk = await chunk_future
-                choice = chunk.choices[0]
-                stop_reason = choice.finish_reason
+
+                # to process usage chunk in streaming situations
+                # add    stream_options={"include_usage": True} in the initialization of OpenAIChatCompletionClient(...)
+                # However the different api's
+                # OPENAI api usage chunk produces no choices so need to check if there is a choice
+                # liteLLM api usage chunk does produce choices
+                choice = (
+                    chunk.choices[0]
+                    if len(chunk.choices) > 0
+                    else choice
+                    if chunk.usage is not None and stop_reason is not None
+                    else cast(ChunkChoice, None)
+                )
+
+                # for liteLLM chunk usage, do the following hack keeping the pervious chunk.stop_reason (if set).
+                # set the stop_reason for the usage chunk to the prior stop_reason
+                stop_reason = choice.finish_reason if chunk.usage is None and stop_reason is None else stop_reason
                 maybe_model = chunk.model
                 # First try get content
                 if choice.delta.content is not None:
@@ -657,17 +699,21 @@ async def create_stream(
         model = maybe_model or create_args["model"]
         model = model.replace("gpt-35", "gpt-3.5")  # hack for Azure API
 
-        # TODO fix count token
-        prompt_tokens = 0
-        # prompt_tokens = count_token(messages, model=model)
+        if chunk and chunk.usage:
+            prompt_tokens = chunk.usage.prompt_tokens
+        else:
+            prompt_tokens = 0
+
         if stop_reason is None:
             raise ValueError("No stop reason found")
 
         content: Union[str, List[FunctionCall]]
         if len(content_deltas) > 1:
             content = "".join(content_deltas)
-            completion_tokens = 0
-            # completion_tokens = count_token(content, model=model)
+            if chunk and chunk.usage:
+                completion_tokens = chunk.usage.completion_tokens
+            else:
+                completion_tokens = 0
         else:
             completion_tokens = 0
             # TODO: fix assumption that dict values were added in order and actually order by int index