added image(s) processing tool

frdel · Aug 22, 2024 · eddd293 · eddd293
1 parent f5f2a02
commit eddd293
Show file tree

Hide file tree

Showing 11 changed files with 261 additions and 1 deletion.
diff --git a/example_work_dir/example_conversation.txt b/example_work_dir/example_conversation.txt
@@ -0,0 +1,136 @@
+User message ('e' to leave):
+> please cat todo.txt and follow instructions
+
+Agent 0: Starting a message:
+{
+    "thoughts": [
+        "The user has requested to read the contents of a file named 'todo.txt'.",
+        "I will execute a command to display the contents of this file.",
+        "After retrieving the contents, I will analyze the instructions provided in the file."
+    ],
+    "tool_name": "code_execution_tool",
+    "tool_args": {
+        "runtime": "terminal",
+        "code": "cat todo.txt"
+    }
+}
+
+Agent 0: Using tool 'code_execution_tool':
+Runtime: terminal
+
+Code: cat todo.txt
+
+Agent 0 code execution output:
+please use the image processing tool to analyze the images found at paths:
+
+'image_1.jpg'
+'image_3.webp'
+'image_4.jpeg'
+'image_6.png'
+
+which one would be the best image for a blog article about cool cats?
+(venv) root@18c0f9b934d9:~#
+
+Agent 0: Starting a message:
+{
+    "thoughts": [
+        "I have retrieved the instructions from 'todo.txt'.",
+        "The user wants to analyze specific images to determine which one would be the best for a blog article about cool cats.",
+        "I will use the image_processing_tool to analyze the provided images."
+    ],
+    "tool_name": "image_processing_tool",
+    "tool_args": {
+        "query": "Please analyze the following images and determine which one would be the best for a blog article about cool cats.",
+        "image_paths": [
+            "image_1.jpg",
+            "image_3.webp",
+            "image_4.jpeg",
+            "image_6.png"
+        ]
+    }
+}
+
+Agent 0: Using tool 'image_processing_tool':
+Query: Please analyze the following images and determine which one would be the best for a blog article about cool cats.
+
+Image paths: ['image_1.jpg', 'image_3.webp', 'image_4.jpeg', 'image_6.png']
+
+
+Agent 0: Response from tool 'image_processing_tool':
+Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about "cool cats." However, if you need to select the best fit for a blog context:
+
+1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.
+2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.
+3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.
+4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.
+
+If the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!
+
+Agent 0: Starting a message:
+{
+    "thoughts": [
+        "The analysis of the images revealed that none of them feature cats, which is essential for the blog article about cool cats.",
+        "The best option among the provided images is the one featuring snails, as it relates to animals and nature, albeit not directly to cats.",
+        "I should inform the user about the findings and suggest that they provide more relevant images if they want to focus on cats."
+    ],
+    "tool_name": "response",
+    "tool_args": {
+        "text": "Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about \"cool cats.\" However, if you need to select the best fit for a blog context:\n\n1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.\n2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.\n3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.\n4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.\n\nIf the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!"
+    }
+}
+
+Agent 0: reponse:
+Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about "cool cats." However, if you need to select the best fit for a blog context:
+
+1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.
+2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.
+3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.
+4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.
+
+If the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!
+
+User message ('e' to leave):
+> try 'image_2.jpg'
+
+Agent 0: Starting a message:
+{
+    "thoughts": [
+        "The user has provided a new image 'image_2.jpg' for analysis.",
+        "I will use the image_processing_tool to analyze this image and determine if it is suitable for a blog article about cool cats."
+    ],
+    "tool_name": "image_processing_tool",
+    "tool_args": {
+        "query": "Please analyze the following image and determine if it would be suitable for a blog article about cool cats.",
+        "image_paths": [
+            "image_2.jpg"
+        ]
+    }
+}
+
+Agent 0: Using tool 'image_processing_tool':
+Query: Please analyze the following image and determine if it would be suitable for a blog article about cool cats.
+
+Image paths: ['image_2.jpg']
+
+
+Agent 0: Response from tool 'image_processing_tool':
+The image of the cat wearing sunglasses and a stylish jacket, set against a vibrant and colorful background, is certainly suitable for a blog article about cool cats. The playful and trendy presentation aligns perfectly with a theme centered around stylish or unique cats, making it visually appealing and engaging for readers interested in such content.
+
+Agent 0: Starting a message:
+{
+    "thoughts": [
+        "The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket.",
+        "This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats.",
+        "I will inform the user about the positive analysis of this image."
+    ],
+    "tool_name": "response",
+    "tool_args": {
+        "text": "The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket. This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats. It is visually appealing and engaging for readers interested in such content."
+    }
+}
+
+Agent 0: reponse:
+The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket. This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats. It is visually appealing and engaging for readers interested in such content.
+
+User message ('e' to leave):
+>
diff --git a/example_work_dir/image_1.jpg b/example_work_dir/image_1.jpg
diff --git a/example_work_dir/image_2.jpg b/example_work_dir/image_2.jpg
diff --git a/example_work_dir/image_3.webp b/example_work_dir/image_3.webp
diff --git a/example_work_dir/image_4.jpeg b/example_work_dir/image_4.jpeg
diff --git a/example_work_dir/image_6.png b/example_work_dir/image_6.png
diff --git a/example_work_dir/todo.txt b/example_work_dir/todo.txt
@@ -0,0 +1,8 @@
+please use the image processing tool to analyze the images found at paths:
+
+'image_1.jpg'
+'image_3.webp'
+'image_4.jpeg'
+'image_6.png'
+
+which one would be the best image for a blog article about cool cats?
diff --git a/main.py b/main.py
@@ -48,7 +48,7 @@ def initialize():
         # msgs_keep_start = 5,
         # msgs_keep_end = 10,
         # max_tool_response_length = 3000,
-        # response_timeout_seconds = 60,
+        response_timeout_seconds = 6,
         code_exec_docker_enabled = True,
         # code_exec_docker_name = "agent-zero-exe",
         # code_exec_docker_image = "frdel/agent-zero-exe:latest",

diff --git a/prompts/agent.tools.md b/prompts/agent.tools.md
@@ -214,3 +214,47 @@ When writing own code, ALWAYS put print/log statements inside and at the end of
     }
 }
 ~~~
+
+### image_processing_tool:
+
+Process images to get insights from them.
+Provide the "image_paths" list[strings] argument with the paths to the image files. Image paths can be local or remote (remote image paths start with 'http').
+Provide also a "query" argument to describe what to analyze or compare.
+The tool will return text data summarizing the image(s) or extracted data from the image(s) depending on the query.
+Use this tool for tasks like OCR (Optical Character Recognition) or image analysis or web page screenshot style analysis or image comparison.
+
+**Example usage**:
+
+~~~json
+{
+  "thoughts": [
+    "The user provided an image...",
+    "I will use the image_processing_tool to analyze the image..."
+  ],
+  "tool_name": "image_processing_tool",
+  "tool_args": {
+    "query": "Please analyze the following image and provide feedback.",
+    "image_paths": ["path/to/image.jpg"]
+  }
+}
+~~~
+
+2. Compare images
+
+~~~json
+{
+  "thoughts": [
+    "The user provided an image...",
+    "I will use the image_processing_tool to compare images..."
+  ],
+  "tool_name": "image_processing_tool",
+  "tool_args": {
+    "query": "Please analyze the following images. Which one is ...?",
+    "image_paths": [
+      "path/to/image.jpg",
+      "path/to/different/picture.png",
+      "https://example.com/image.jpg"
+    ]
+  }
+}
+~~~
diff --git a/python/helpers/process_image.py b/python/helpers/process_image.py
@@ -0,0 +1,33 @@
+from openai import OpenAI
+import models
+
+def process_image(query: str, image_urls: list, model_name="gpt-4o-mini", api_key=None):
+
+    api_key = api_key or models.get_api_key("openai")
+
+    if not api_key:
+        raise ValueError("The image processing tool requires an openai api key.")
+
+    client = OpenAI(api_key=api_key)
+
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": query},
+                *[
+                    {"type": "image_url", "image_url": {"url": url}}
+                    for url in image_urls
+                ]
+            ]
+        }
+    ]
+
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+    )
+
+    result = response.choices[0].message.content  # only the text is returned
+
+    return result
diff --git a/python/tools/image_processing_tool.py b/python/tools/image_processing_tool.py
@@ -0,0 +1,39 @@
+from agent import Agent
+from python.helpers.tool import Tool, Response
+from python.helpers import files
+from python.helpers.process_image import process_image
+from python.helpers.print_style import PrintStyle
+import base64
+import os
+
+class ImageProcessingTool(Tool):
+
+    def encode_image(image_path):
+        with open(image_path, "rb") as image_file:
+            return base64.b64encode(image_file.read()).decode('utf-8')
+
+    def execute(self, query: str, **kwargs):
+
+        image_paths = self.args["image_paths"]
+
+        if not image_paths or not isinstance(image_paths, list):
+            raise ValueError("The image_paths is either empty, None, or not a valid list of strings.")
+
+        processed_image_paths = []
+
+        for image_path in image_paths:
+            if not image_path.startswith("http"):
+                if not os.path.exists(image_path):
+                    raise FileNotFoundError(f"The local file '{image_path}' not found.")
+                with open(image_path, "rb") as image_file:
+                    base64_image = base64.b64encode(image_file.read()).decode('utf-8')
+                    image_path = f"data:image/jpeg;base64,{base64_image}"
+            processed_image_paths.append(image_path)
+
+        content = process_image(query, processed_image_paths)
+
+        if self.agent.handle_intervention(content): 
+            return Response(message="", break_loop=False)  # wait for intervention and handle it, if paused
+
+        # Return the response
+        return Response(message=content, break_loop=False)