Skip to content

TongkunGuan/SIGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Supervised Implicit Glyph Attention for Text Recognition (CVPR2023)

This is the official code of "Self-Supervised Implicit Glyph Attention for Text Recognition". For more details, please refer to our CVPR2023 paper or Poster or 中文解读. If you have any questions please contact me by email ([email protected]).

We also released ICCV23 work on scene text recognition:

  • Self-supervised Character-to-Character Distillation for Text Recognition(CCD) Paper and Code

Pipeline

examples

Model architecture

examples

Environments

# V100 Ubuntu 16.04 Cuda 10
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install tensorboard==1.15.0
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# 3090 Ubuntu 16.04 Cuda 11
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install tensorboard==2.11.2
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# if you meet bug about setuptools
# pip uninstall setuptools
# pip install setuptools==58.0.4

Data

-- root_path
    --training
        --MJ
        --ST
    --validation
    --evaluation
        --SVT
        --IIIK
        --...

Highlights

Mask preparation

  • optional, K-means results (please refer to CCD)
cd ./mask_create
run generate_mask.py #parallelly process mask --> lmdb file
run merge.py #merge multiple lmdb files into single file

Training

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --model_name TRBA --exp_name SIGA --Aug --batch_size 320 --num_iter 160000 --select_data synth --benchmark_all_eval --train_data /xxx/dataset/data_lmdb/training/label/Synth/ --eval_data /xxx/dataset/data_lmdb/evaluation/ --mask_path /xxx/dataset/data_lmdb/Mask(optional) --workers 12

Test and eval

python test.py --eval_data /xxx/xxx --select_data xxx

TODO

  • Refactor and clean code

Citation

If you find our method useful for your reserach, please cite

@inproceedings{guan2023self,
  title={Self-Supervised Implicit Glyph Attention for Text Recognition},
  author={Guan, Tongkun and Gu, Chaochen and Tu, Jingzheng and Yang, Xue and Feng, Qi and Zhao, Yudi and Shen, Wei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15285--15294},
  year={2023}
}

License

- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.