Small Tweaks For PaliGemma (Ongoing)

Introduction

PaliGemma is a cutting-edge multimodal AI model that combines the power of language understanding with visual comprehension. Built on the latest advancements in machine learning, PaliGemma offers state-of-the-art performance in tasks involving both text and images.

Improvements

Precision: Set float32 matmul precision from highest to high. This is to provide a speedup in the training process.
Mixed Precision Training: Converted model parameters to bfloat16. This is to reduce the memory usage and speed up the training process.
Compiling: Compiled the model with max autotune with full graph optimization.
Projection Fusing: Fused query, key, and value projections into a single linear layer. This is to reduce the number of operations and improve the computational efficiency.
SDPA: Convert regular self attention to PyTorch's flas attention implementation for both language and vision models, significantly improving computational efficiency speeding up the train and inference time.
Training: Added Training loop as well as simple dataset support.

TODO

Implement SoViT-400m architecture, which is the shape-optimized version of the original SigLIP.
Get pretrained model and perform finetune on user data.

Acknowledgements

Huggingface implementation can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
data		data
README.md		README.md
SigLIP.py		SigLIP.py
configs.py		configs.py
dataset.py		dataset.py
inference.py		inference.py
kv_cache.py		kv_cache.py
main.py		main.py
pali_gemma.py		pali_gemma.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small Tweaks For PaliGemma (Ongoing)

Introduction

Improvements

TODO

Acknowledgements

About

Languages

deniztemur00/PaliGemma-from-Scratch-Tweaks

Folders and files

Latest commit

History

Repository files navigation

Small Tweaks For PaliGemma (Ongoing)

Introduction

Improvements

TODO

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages