Skip to content

Latest commit

 

History

History
85 lines (61 loc) · 2.48 KB

README.md

File metadata and controls

85 lines (61 loc) · 2.48 KB

Text Recognizer

Implementing the text recognizer project from the course "Full Stack Deep Learning Course" (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in "Hypermodern Python".

Prerequisite

  • pyenv (or similar) and python 3.9.* installed.

  • nox for linting, formatting, and testing.

  • Poetry is a project manager for python.

Installation

Install poetry and pyenv.

pyenv local 3.9.*
make install

Generate Datasets

Download and generate datasets by running:

make download
make generate

Train

Use, modify, or create a new experiment found at training/conf/experiment/. To run an experiment we first need to enter the virtual env by running:

poetry shell

Then we can train a new model by running:

python main.py +experiment=conv_transformer_paragraphs

Network

Create a picture of the network and place it here

Graveyard

Ideas of mine that did not work unfortunately:

  • Efficientnet was apparently a terrible choice of an encoder

    • A ConvNext module heavily copied from lucidrains x-unet was incredibly much better at encoding the images to a better representation.
  • Use VQVAE to create pre-train a good latent representation

    • Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
    • This is very unfortunate as I really hoped that this idea would work :(
    • I still really like this idea, and I might not have given up just yet...
    • I have now given up... :( ConvNext ftw
  • Axial Transformer Encoder

    • Added a lot of extra parameters with no gain in performance
    • Cool idea, but on a single GPU
  • Word Pieces

    • Might have worked better, but liked the idea of single character recognition more.

Todo

  • remove einops (try)
  • Tests
  • Evaluation
  • Wandb artifact fetcher
  • fix linting
  • Modularize the decoder
  • Add kv cache
  • Train with Laprop
  • Fix stems
  • residual attn
  • single kv head
  • fix rotary embedding
  • simplify attention with norm
  • tie embeddings
  • cnn -> tf encoder -> tf decoder