Text Recognizer

Implementing the text recognizer project from the course "Full Stack Deep Learning Course" (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in "Hypermodern Python".

Prerequisite

pyenv (or similar) and python 3.9.* installed.
nox for linting, formatting, and testing.
Poetry is a project manager for python.

Installation

Install poetry and pyenv.

pyenv local 3.9.*
make install

Generate Datasets

Download and generate datasets by running:

make download
make generate

Train

Use, modify, or create a new experiment found at training/conf/experiment/. To run an experiment we first need to enter the virtual env by running:

poetry shell

Then we can train a new model by running:

python main.py +experiment=conv_transformer_paragraphs

Network

Create a picture of the network and place it here

Graveyard

Ideas of mine that did not work unfortunately:

Efficientnet was apparently a terrible choice of an encoder
- A ConvNext module heavily copied from lucidrains x-unet was incredibly much better at encoding the images to a better representation.
Use VQVAE to create pre-train a good latent representation
- Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
- This is very unfortunate as I really hoped that this idea would work :(
- I still really like this idea, and I might not have given up just yet...
- I have now given up... :( ConvNext ftw
Axial Transformer Encoder
- Added a lot of extra parameters with no gain in performance
- Cool idea, but on a single GPU
Word Pieces
- Might have worked better, but liked the idea of single character recognition more.

Todo

remove einops (try)
Tests
Evaluation
Wandb artifact fetcher
fix linting
Modularize the decoder
Add kv cache
Train with Laprop
Fix stems
residual attn
single kv head
fix rotary embedding
simplify attention with norm
tie embeddings
cnn -> tf encoder -> tf decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Text Recognizer

Prerequisite

Installation

Generate Datasets

Train

Network

Graveyard

Todo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Text Recognizer

Prerequisite

Installation

Generate Datasets

Train

Network

Graveyard

Todo