A framework for harnessing the power of transformers with YOLO models and other single-shot detectors!
Note: This is an early release. The package is under active development. Please report any issues and I'll try to fix them ASAP.
π₯ NEW π₯ D-Fine models are now available. Inspired by RT-DETR outperform all real-time detectors including YOLO-series models
pip install trolo
- π₯ Transformer-enhanced object detection
- π― Single-shot detection capabilities
- β‘ High performance inference
- π οΈ Easy to use CLI interface
- π Fast video stream inference
- π§ Automatic DDP handling
D-FINE
The D-FINE model redefines regression tasks in DETR-based detectors using Fine-grained Distribution Refinement (FDR). Official Paper | Official Repo
( All models will be automatically downloaded when you pass the name for any task)
Model | Dataset | APval | #Params | Latency | GFLOPs |
---|---|---|---|---|---|
dfine-n |
COCO | 42.8 | 4M | 2.12ms | 7 |
dfine-s |
COCO | 48.5 | 10M | 3.49ms | 25 |
dfine-m |
COCO | 52.3 | 19M | 5.62ms | 57 |
dfine-l |
COCO | 54.0 | 31M | 8.07ms | 91 |
dfine-x |
COCO | 55.8 | 62M | 12.89ms | 202 |
RT-DETR v3 (Coming Soon)
RT-DETR v2 (Coming Soon)
Trolo-2024 (WIP)
The CLI command structure is:
trolo [command] [options]
For detailed help:
trolo --help # for general help
trolo [command] --help # for command-specific help
Example inference command:
trolo predict --model dfine-n # automatically downloads model from trolo model hub
Support for single image, image folder, and video input
trolo predict --model dfine-n.pth --input img.jpg # folder/ or video.mp4 or 0 (for webcam)
π₯ Smart Video stream inference - infers on videos in streaming mode so you never have to worry about memory issues!
Python API:
from trolo.inference import DetectionPredictor
predictor = DetectionPredictor(model="dfine-n")
predictions = predictor.predict() # get predictions
poltted_preds = predictor.visualize(show=True, save=True) # or get visualized outputs
Visit Inference Docs for more details
Example export command:
trolo export --model dfine-n --export_format onnx --input_size 640
Python API:
python from trolo.inference import ModelExporter model_path = "/path/to/model"
input_size = 640 # Inference resolution
export_format = "onnx"
exporter = ModelExporter(model=model_path) exporter.export(input_size=input_size, export_format=export_format)
Visit Export Docs for more details. Please check deployment for inference script for various deployment.
Example training command:
trolo train --config dfine_n # automatically find the config file
π₯ Automatically handle DDP by simply passing the GPUs to the CLI
trolo train --config dfine_n --device 0,1,2,3
That's it!
Python API
from trolo.trainers import DetectionTrainer
trainer = DetectionTrainer(config="dfine_n") # or pass custom config path
trainer.train() # pass device = 0,1,2,3 to automatically handle DDP
Visit Training Docs for more details
Build and Run Options
# Standard build
docker build -t trolo .
# Build with a specific tag
docker build -t trolo:v1 .
# Build with build arguments (if needed)
docker build --build-arg SOME_ARG=value -t trolo .
docker run -it --name containaer_name trolo
docker run -it --gpus --name containaer_name all trolo
docker run -it -v /local/path/on/host:/workspace/app/trolo --name containaer_name trolo
docker run -it --gpus all -v /local/path/on/host:/workspace/app/trolo --name containaer_name trolo
# Run with custom entrypoint
docker run -it --entrypoint /bin/bash trolo
# Run with environment variables
docker run -it -e CUSTOM_ENV=value trolo
# Run in detached mode
docker run -d trolo
- Replace
/local/path/on/host
with your actual host path --gpus all
requires NVIDIA Container Toolkit- Volume mounting allows persistent data and code modifications
TLDR: This is a non-profit project.Use it, modify it, copy it, do whatever you want with it. And if something doesn't allow you to do that, please open an issue.
More details
-
Apache 2.0
-
The license has simply been copied from official apache repo. Please open an issue if something doesn't allow you to use it.
-
This project is built on top of open licensed projects as mentioned below.
-
I intend to keep this project free and open source FOREVER. There are no plans for direct/indirect monetization of this project. I only accept sponsorships for compute resources to train models and perform independent research.
This project builds upon several excellent open source projects:
- D-FINE: Original D-FINE model implementation
- RT-DETR: Real-time DETR architecture
- PaddlePaddle: Detection framework
More details
- The original trainer is based on D-fine with major modifications for handling pre-trained weights, DDP, and other features. - The architecture is for D-fine is same as the original paper and repo.Contributions are most welcome! Please feel free to submit a Pull Request.
Note: This is an early work in progress. Many features are still under development.
- Docusaurus documentation