Skip to content

Commit

Permalink
added image(s) processing tool
Browse files Browse the repository at this point in the history
  • Loading branch information
vandyand committed Aug 22, 2024
1 parent f5f2a02 commit eddd293
Show file tree
Hide file tree
Showing 11 changed files with 261 additions and 1 deletion.
136 changes: 136 additions & 0 deletions example_work_dir/example_conversation.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
User message ('e' to leave):
> please cat todo.txt and follow instructions

Agent 0: Starting a message:
{
"thoughts": [
"The user has requested to read the contents of a file named 'todo.txt'.",
"I will execute a command to display the contents of this file.",
"After retrieving the contents, I will analyze the instructions provided in the file."
],
"tool_name": "code_execution_tool",
"tool_args": {
"runtime": "terminal",
"code": "cat todo.txt"
}
}

Agent 0: Using tool 'code_execution_tool':
Runtime: terminal

Code: cat todo.txt

Agent 0 code execution output:
please use the image processing tool to analyze the images found at paths:

'image_1.jpg'
'image_3.webp'
'image_4.jpeg'
'image_6.png'

which one would be the best image for a blog article about cool cats?
(venv) root@18c0f9b934d9:~#

Agent 0: Starting a message:
{
"thoughts": [
"I have retrieved the instructions from 'todo.txt'.",
"The user wants to analyze specific images to determine which one would be the best for a blog article about cool cats.",
"I will use the image_processing_tool to analyze the provided images."
],
"tool_name": "image_processing_tool",
"tool_args": {
"query": "Please analyze the following images and determine which one would be the best for a blog article about cool cats.",
"image_paths": [
"image_1.jpg",
"image_3.webp",
"image_4.jpeg",
"image_6.png"
]
}
}

Agent 0: Using tool 'image_processing_tool':
Query: Please analyze the following images and determine which one would be the best for a blog article about cool cats.

Image paths: ['image_1.jpg', 'image_3.webp', 'image_4.jpeg', 'image_6.png']


Agent 0: Response from tool 'image_processing_tool':
Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about "cool cats." However, if you need to select the best fit for a blog context:

1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.
2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.
3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.
4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.

If the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!

Agent 0: Starting a message:
{
"thoughts": [
"The analysis of the images revealed that none of them feature cats, which is essential for the blog article about cool cats.",
"The best option among the provided images is the one featuring snails, as it relates to animals and nature, albeit not directly to cats.",
"I should inform the user about the findings and suggest that they provide more relevant images if they want to focus on cats."
],
"tool_name": "response",
"tool_args": {
"text": "Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about \"cool cats.\" However, if you need to select the best fit for a blog context:\n\n1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.\n2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.\n3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.\n4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.\n\nIf the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!"
}
}

Agent 0: reponse:
Based on the images you provided, none of them appear to feature cats, which would be relevant for a blog article about "cool cats." However, if you need to select the best fit for a blog context:

1. **Image of a boat at sunset:** This image has a calming and visually appealing aesthetic but does not relate to cats.
2. **Image of a bicycle with flowers:** This might evoke a whimsical or cozy vibe, but again, it doesn't relate to cats.
3. **Image of snails:** While not cats, snails can relate to a theme of animals or nature, making this the most relevant image if you want to touch on creatures of interest.
4. **Abstract shapes:** This image does not convey any recognizable themes that would contribute to a blog about cats.

If the focus is specifically on cats, none of the images would serve that purpose well. If you have more images or would like recommendations for cat-related content, feel free to share!

User message ('e' to leave):
> try 'image_2.jpg'

Agent 0: Starting a message:
{
"thoughts": [
"The user has provided a new image 'image_2.jpg' for analysis.",
"I will use the image_processing_tool to analyze this image and determine if it is suitable for a blog article about cool cats."
],
"tool_name": "image_processing_tool",
"tool_args": {
"query": "Please analyze the following image and determine if it would be suitable for a blog article about cool cats.",
"image_paths": [
"image_2.jpg"
]
}
}

Agent 0: Using tool 'image_processing_tool':
Query: Please analyze the following image and determine if it would be suitable for a blog article about cool cats.

Image paths: ['image_2.jpg']


Agent 0: Response from tool 'image_processing_tool':
The image of the cat wearing sunglasses and a stylish jacket, set against a vibrant and colorful background, is certainly suitable for a blog article about cool cats. The playful and trendy presentation aligns perfectly with a theme centered around stylish or unique cats, making it visually appealing and engaging for readers interested in such content.

Agent 0: Starting a message:
{
"thoughts": [
"The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket.",
"This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats.",
"I will inform the user about the positive analysis of this image."
],
"tool_name": "response",
"tool_args": {
"text": "The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket. This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats. It is visually appealing and engaging for readers interested in such content."
}
}

Agent 0: reponse:
The analysis of 'image_2.jpg' indicates that it features a cat wearing sunglasses and a stylish jacket. This image is suitable for a blog article about cool cats, as it aligns well with the theme of stylish or unique cats. It is visually appealing and engaging for readers interested in such content.

User message ('e' to leave):
>
Binary file added example_work_dir/image_1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example_work_dir/image_2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example_work_dir/image_3.webp
Binary file not shown.
Binary file added example_work_dir/image_4.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example_work_dir/image_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions example_work_dir/todo.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
please use the image processing tool to analyze the images found at paths:

'image_1.jpg'
'image_3.webp'
'image_4.jpeg'
'image_6.png'

which one would be the best image for a blog article about cool cats?
2 changes: 1 addition & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def initialize():
# msgs_keep_start = 5,
# msgs_keep_end = 10,
# max_tool_response_length = 3000,
# response_timeout_seconds = 60,
response_timeout_seconds = 6,
code_exec_docker_enabled = True,
# code_exec_docker_name = "agent-zero-exe",
# code_exec_docker_image = "frdel/agent-zero-exe:latest",
Expand Down
44 changes: 44 additions & 0 deletions prompts/agent.tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,47 @@ When writing own code, ALWAYS put print/log statements inside and at the end of
}
}
~~~

### image_processing_tool:

Process images to get insights from them.
Provide the "image_paths" list[strings] argument with the paths to the image files. Image paths can be local or remote (remote image paths start with 'http').
Provide also a "query" argument to describe what to analyze or compare.
The tool will return text data summarizing the image(s) or extracted data from the image(s) depending on the query.
Use this tool for tasks like OCR (Optical Character Recognition) or image analysis or web page screenshot style analysis or image comparison.

**Example usage**:

~~~json
{
"thoughts": [
"The user provided an image...",
"I will use the image_processing_tool to analyze the image..."
],
"tool_name": "image_processing_tool",
"tool_args": {
"query": "Please analyze the following image and provide feedback.",
"image_paths": ["path/to/image.jpg"]
}
}
~~~

2. Compare images

~~~json
{
"thoughts": [
"The user provided an image...",
"I will use the image_processing_tool to compare images..."
],
"tool_name": "image_processing_tool",
"tool_args": {
"query": "Please analyze the following images. Which one is ...?",
"image_paths": [
"path/to/image.jpg",
"path/to/different/picture.png",
"https://example.com/image.jpg"
]
}
}
~~~
33 changes: 33 additions & 0 deletions python/helpers/process_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from openai import OpenAI
import models

def process_image(query: str, image_urls: list, model_name="gpt-4o-mini", api_key=None):

api_key = api_key or models.get_api_key("openai")

if not api_key:
raise ValueError("The image processing tool requires an openai api key.")

client = OpenAI(api_key=api_key)

messages = [
{
"role": "user",
"content": [
{"type": "text", "text": query},
*[
{"type": "image_url", "image_url": {"url": url}}
for url in image_urls
]
]
}
]

response = client.chat.completions.create(
model=model_name,
messages=messages,
)

result = response.choices[0].message.content # only the text is returned

return result
39 changes: 39 additions & 0 deletions python/tools/image_processing_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from agent import Agent
from python.helpers.tool import Tool, Response
from python.helpers import files
from python.helpers.process_image import process_image
from python.helpers.print_style import PrintStyle
import base64
import os

class ImageProcessingTool(Tool):

def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

def execute(self, query: str, **kwargs):

image_paths = self.args["image_paths"]

if not image_paths or not isinstance(image_paths, list):
raise ValueError("The image_paths is either empty, None, or not a valid list of strings.")

processed_image_paths = []

for image_path in image_paths:
if not image_path.startswith("http"):
if not os.path.exists(image_path):
raise FileNotFoundError(f"The local file '{image_path}' not found.")
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
image_path = f"data:image/jpeg;base64,{base64_image}"
processed_image_paths.append(image_path)

content = process_image(query, processed_image_paths)

if self.agent.handle_intervention(content):
return Response(message="", break_loop=False) # wait for intervention and handle it, if paused

# Return the response
return Response(message=content, break_loop=False)

0 comments on commit eddd293

Please sign in to comment.