Skip to content
/ Rim Public

⚙️ Rim is a Rust based Multi-Modal Hyper Caption Tool supoorts GeminiPro/GeminiFlash, GPT4v/GPT4o, etc

License

Notifications You must be signed in to change notification settings

AUTOM77/Rim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rim

GitHub Workflow Status (with event) GitHub license GitHub contributors GitHub commit activity (branch) GitHub top language Open Issues Code Size GitHub all releases
GitHub release (with filter)

Rim, a Rust based Multi-Modal Hyper Caption Tool in Parallel, v3.0 released!

Features

  • support Universal image/video media mixed caption task
  • support OpenAI Models in Azure Platform, GPT-4o, GPT-4v
  • support Gemini Model in Google Cloud Platform, Gemini-1.5-flash, Gemini-1.5-pro
  • support Multi-Prompt with seperate naming space
  • support Optional Service Selection
  • support QPS config, default is 20 in parallel
  • support Limit config, default is first 100 jobs
  • support Seperate saving path for $MODEL/$PROMPT/$File.txt

Usage

Tip

rim assets/images/1.png -c config.toml --limit 100 --qps 20

For a single key on single project, we recommend using rim ${path} -c config.toml --limit 360.

Old Usage
  1. Single Image/Video Captioning:
rim -f ${file_path} -c `config.toml`

Rim generates a *.txt file containing the caption for a single image or video.

  1. Batch Image/Video Captioning:
rim -d ${dir_path} -c `config.toml`
  1. Batch of Batch:
DATA=/data
for i in $DATA/*; do [ -d "$i" ] && ./target/release/rim $i -c config.toml  --limit 1500 --qps 500 ; done

For a directory of images or videos, Rim generates a corresponding list of *.txt caption files.


  1. Rim will now generates a folder called xxx_cap contains *.txt caption files.
  2. Sample config.toml can be found in config.toml

Config

Creating a Sample Configuration (Unix):

cat <<EOF | tee config.toml
[[prompt]]
name = "simple"
value = "Caption this video."

[[prompt]]
name = "example"
value = "Provide a brief summary of the video content focusing on key themes and messages."

[azure]
api = [
    ['https://closedAI-1.openai.azure.com', 'sk-00000000000000000000000000000000', 'gpt-4o'],
    ['https://closedAI-2.openai.azure.com', 'sk-00000000000000000000000000000001', 'gpt-4v']
]

[gemini]
api = [
    ['https://generativelanguage.googleapis.com', 'AIza00000000000000000000000000000000000', 'gemini-1.5-flash-latest'],
    ['https://generativelanguage.googleapis.com', 'AIza00000000000000000000000000000000001', 'gemini-1.5-pro-latest'],
]
EOF

Nightly Build

curl -fsSL https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
rustup update nightly && rustup default nightly

cargo build --release
./target/release/rim "assets/images" -c config.toml

Nightly Build with mirror

curl -fsSL https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
echo """
[source.crates-io]
replace-with = 'mirror'

[source.mirror]
registry = 'sparse+https://mirrors.tuna.tsinghua.edu.cn/crates.io-index/'
""" | tee ${CARGO_HOME:-$HOME/.cargo}/config.toml
rustup update nightly && rustup default nightly

cargo build --release
./target/release/rim "assets/images" -c config.toml

About

⚙️ Rim is a Rust based Multi-Modal Hyper Caption Tool supoorts GeminiPro/GeminiFlash, GPT4v/GPT4o, etc

Resources

License

Stars

Watchers

Forks

Packages

No packages published