Self-Supervised Implicit Glyph Attention for Text Recognition (CVPR2023)

This is the official code of "Self-Supervised Implicit Glyph Attention for Text Recognition". For more details, please refer to our CVPR2023 paper or Poster or 中文解读. If you have any questions please contact me by email ([email protected]).

We also released ICCV23 work on scene text recognition:

Self-supervised Character-to-Character Distillation for Text Recognition（CCD） Paper and Code

Pipeline

Model architecture

Environments

# V100 Ubuntu 16.04 Cuda 10
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install tensorboard==1.15.0
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# 3090 Ubuntu 16.04 Cuda 11
conda create -n SIGA python==3.7.0
source activate SIGA
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install tensorboard==2.11.2
pip install tensorboardX==2.2
pip install opencv-python
pip install Pillow LMDB nltk six natsort scipy
# if you meet bug about setuptools
# pip uninstall setuptools
# pip install setuptools==58.0.4

Data

-- root_path
    --training
        --MJ
        --ST
    --validation
    --evaluation
        --SVT
        --IIIK
        --...

Highlights

Dataset link:
weight link:
- SIGA_R The model is trained on V100 platform.
- SIGA_S The model is trained on 3090 platform.
- SIGA_T The model is trained on 3090 platform.

Mask preparation

optional, K-means results (please refer to CCD)

cd ./mask_create
run generate_mask.py #parallelly process mask --> lmdb file
run merge.py #merge multiple lmdb files into single file

Training

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --model_name TRBA --exp_name SIGA --Aug --batch_size 320 --num_iter 160000 --select_data synth --benchmark_all_eval --train_data /xxx/dataset/data_lmdb/training/label/Synth/ --eval_data /xxx/dataset/data_lmdb/evaluation/ --mask_path /xxx/dataset/data_lmdb/Mask(optional) --workers 12

Test and eval

python test.py --eval_data /xxx/xxx --select_data xxx

TODO

Refactor and clean code

Citation

If you find our method useful for your reserach, please cite

@inproceedings{guan2023self,
  title={Self-Supervised Implicit Glyph Attention for Text Recognition},
  author={Guan, Tongkun and Gu, Chaochen and Tu, Jingzheng and Yang, Xue and Feng, Qi and Zhao, Yudi and Shen, Wei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15285--15294},
  year={2023}
}

License

- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.vscode		.vscode
SIGA_R		SIGA_R
SIGA_S		SIGA_S
SIGA_T		SIGA_T
graph		graph
README.md		README.md
SIGA_poster.pdf		SIGA_poster.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Implicit Glyph Attention for Text Recognition (CVPR2023)

Pipeline

Model architecture

Environments

Data

Highlights

Mask preparation

Training

Test and eval

TODO

Citation

License

About

Releases

Packages

Languages

TongkunGuan/SIGA

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Implicit Glyph Attention for Text Recognition (CVPR2023)

Pipeline

Model architecture

Environments

Data

Highlights

Mask preparation

Training

Test and eval

TODO

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages