Fovea is a unified command-line interface to computer vision APIs from Google, Microsoft, AWS, Clarifai, Imagga, IBM Watson, and SightHound Use Fovea if you want to:
- Easily classify images in a shell script.
- Compare alternative computer vision apis.
The table below characterizes Fovea's current feature coverage. Most vendors offer broadly similar features, but their output formats differ. Where possible, Fovea uses a tabular output mode suitable for interactive shell sessions, and scripts. If a particular feature is not supported by this tabular output mode, vendor-specific JSON is available, instead.
Feature | Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|---|
Labels | ✅️️ | ✅ ️️ | ✅️️ | ✅ | ✅ | ✅ | ✅ ️️ | ✅ ️️ | |
Label i18n | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||
Faces | ✅️️ | ✅️️ | ✅️️ | ✅ | ✅ | ✅️️ | ✅️️ | ✅️️ | |
Landmarks | ✅ | ✅️️ | ✅️ ️ | ||||||
Text (OCR) | ✅ | ✅️️️ | ️️❌ | ✅️️ | |||||
Emotions | ✅️️ | ✅️️ | ❌️ | ✅ | ❌ | ✅️️ | |||
Description | ✅️️ | ❌ | ✅️️ | ||||||
Adult (NSFW) | ✅ | ✅️️ | ✅️️ | ❌ | ✅️️ | ✅️️ | |||
Categories | ✅️️ | ✅️️ | ✅️️ | ✅️️ | |||||
Image Type | ✅️ | ❌ | ✅️ ️ | ||||||
Color | ✅️️ | ✅️️ | ❌ | ❌ | ✅️️ | ||||
Celebrities | ✅ | ✅️ | ✅ | ✅️ | ✅️ | ✅ | |||
Vehicles | ✅ | ✅️ | ✅ |
✅ indicates a working feature, ❌ indicates a missing feature, and empty-cells represent features not supported by a particular vendor.
Clone the Fovea repository, install its dependencies, and source its environment script.
[user@host]$ git clone https://github.com/28mm/Fovea.git
[user@host]$ cd Fovea
[user@host]$ pip3 install -r requirements.txt
[user@host]$ source fovea-env.sh
Obtain credentials for the web services you plan to use. These should be supplied to Fovea via environment variables. See fovea-env.sh
for a template.
- Google Cloud Vision API: https://cloud.google.com/vision/docs/
- Microsoft Computer Vision API: https://www.microsoft.com/cognitive-services/en-us/computer-vision-api
- Amazon Web Services Rekognition: https://aws.amazon.com/rekognition/
- IBM Watson Image Recognition: https://www.ibm.com/watson/developercloud/visual-recognition.html
- Clarifai: https://developer.clarifai.com/
- Imagga: https://docs.imagga.com
- SightHound: https://www.sighthound.com/products/cloud/
export GOOG_CV_KEY=""
export MSFT_CV_KEY=""
export AWS_CV_KEY_ID=""
export AWS_CV_KEY_SECRET=""
export AWS_CV_REGION=""
export CLARIFAI_CLIENT_ID=""
export CLARIFAI_CLIENT_SECRET=""
export CLARIFAI_ACCESS_TOKEN=""
export WATSON_CV_URL=""
export WATSON_CV_KEY=""
export IMAGGA_ID=""
export IMAGGA_SECRET=""
export SIGHTHOUND_TOKEN=""
usage: fovea [-h]
[--provider {google,microsoft,amazon,opencv,watson,clarifai,imagga,facebook,sighthound}]
[--google | --microsoft | --amazon | --opencv | --watson | --clarifai | --facebook | --imagga | --sighthound]
[--output {tabular,json,yaml}] [--tabular | --json | --yaml]
[--lang LANG] [--ocr-lang OCR_LANG] [--max-labels MAX_LABELS]
[--precision PRECISION] [--labels] [--faces] [--text]
[--emotions] [--description] [--celebrities] [--adult]
[--categories] [--image_type] [--color] [--landmarks]
[--vehicles] [--confidence confidence threshold] [--ontology]
[--model MODELS] [--list-models] [--list-langs]
[--list-ocr-langs]
[files [files ...]]
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ✅ ️️ | ✅️️ | ✅ | ✅ | ✅ | ✅ ️️ | ✅ ️️ |
If no other flags are set, --labels
is set by default, and --provider
is set to google
.
[user@host]$ fovea inverts/cten/pleur.jpg
0.76 biology
0.72 organism
0.61 invertebrate
0.50 deep sea fish
[user@host]$ fovea --clarifai inverts/cten/pleur.jpg
1.00 invertebrate
0.99 science
0.97 no person
0.97 desktop
0.96 biology
0.96 exploration
0.94 nature
0.93 underwater
0.93 one
0.92 wildlife
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Several providers offer label translations, and all default to English (en). Learn which languages a given provider supports with the list --list-langs
flag.
[user@host]$ fovea --microsoft --list-langs
en
zh
From the list of vendor-supported languages, set the desired language with the --lang
argument.
[user@host]$ fovea http://omp.gso.uri.edu/ompweb/doee/biota/inverts/cten/pleur.jpg --clarifai --lang ar
0.99 لافقاريات
0.99 العلوم
0.96 لا يجوز لأي شخص
0.96 خلفية حاسوب
0.95 علم الاحياء
0.95 استكشاف
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ✅️️ | ✅️️ | ✅ | ✅ | ✅️️ | ✅️️ | ✅️️ |
Most vendors support face detection. In addtion, OpenCV's pre-trained Haar cascade is available with the --faces
and --opencv
flags. The bounding box for each detected face is reported in a four field format, described below.
- Left-X
- Top-Y
- Width
- Height
See examples/face-detection for further information.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅ | ✅️️ | ✅️ ️ |
At present, only Google supports landmark and location detection.
[user@host]$ fovea --landmarks ../ex/rattlesnake-ledge.jpg
0.35 Rattlesnake Lake 47.436158,-121.77812576293945
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅ | ✅️️️ | ️️❌ | ✅️️ |
OCR is only supported in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ✅️️ | ❌️ | ✅️️ | ❌ | ✅️️ |
Emotion detection is only supported in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ❌ | ✅️️ |
Scene descriptions are only available in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅ | ✅️️ | ✅️️ | ❌ | ✅️️ | ✅️️ |
The parameters for NSFW and Adult image detection vary a bit between vendors. The values for Google are fudged from likelihoods (VERY_LIKELY, LIKELY, VERY_UNLIKELY) to numeric values (0.01, 0.25, 0.50, 0.75, 0.99), in order to follow the convention established by other services.
[user@host]$ fovea --adult --google test.jpg
0.25 nsfw
[user@host]$ fovea --adult --clarifai test.jpg
0.99 sfw
0.01 nsfw
[user@host]$ fovea --adult --microsoft test.jpg
0.13 nsfw
0.07 racy
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ✅️️ | ✅️️ | ✅️️ |
Categoriziation is only available in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️ | ❌ | ✅️ ️ |
Image type detection is only available in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️️ | ✅️️ | ❌ | ❌ | ✅️️ |
Dominant color detection is only available in the JSON output mode, and its format is vendor specific.
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅ | ✅️ | ✅ | ✅️ | ✅️ | ✅ |
Celebrity face matches are reported in a seven field format (if --ontology
is set), or a six field format (if --ontology
is not set). These formats are described below.
- Left-X
- Top-Y
- Width
- Height
- Confidence Score
- Ontology Link or a placeholder (if
--ontology
is set.) - The Celebrity's Name
[user@host]$ fovea obamas.jpg --microsoft --celebrities
432 134 148 148 0.95 Barack Hussein Obama
279 191 117 117 1.00 Michelle Obama
In contrast to IBM and Microsoft, which return only their highest confidence results, Clarfai returns a long list of possible matches for each face. Exclude lower-confidence matches with the --confidence <int>
parameter.
[user@host]$ fovea --celebrities --clarifai --confidence 0.9 --ontology obamas.jpg
427 122 162 162 0.99 ai_5XjK3npz barack obama
266 179 140 140 0.95 ai_z2S44mJX michelle obama
Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
---|---|---|---|---|---|---|---|---|
✅️ | ✅️ | ✅ |
Vehicle detection and recognition are only available with SightHound. Recognized cars are reported in a ten field format, described below.
- Left-X
- Top-Y
- Width
- Height
- Confidence Score (Make)
- Make (e.g. Cadillac)
- Confidence Score (Model)
- Model (e.g. Ats)
- Confidence Score (Color)
- Color (e.g. Black)
[user@host]$ fovea --sighthound --vehicles batmobile.jpg
15 112 580 238 0.08 Cadillac 0.08 Ats 0.66 black