Adaptive Attention Generation for Indonesian Image Captioning

this code is implementation of Adaptive Attention Generation for Indonesian Image Captioning paper

Changelog

2020-01-15: Initial commit
2024-03-10: Refactor the code, add support to pytorch 2.x, and add more documentation

Evaluation Results

Model	Dataset	BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	ROUGE-L	CIDEr	Link
ResNet101 & LSTM	Flickr30K	0.695	0.539	0.403	0.299	0.256	0.544	0.895	Download
ResNet101 & LSTM	COCO2014 val	0.667	0.497	0.358	0.257	0.245	0.509	0.967	Download

More models comming soon

Requirements

Python 3.10
Java 1.8 (Required by the evaluation module, if you want to inference only, you can skip this)

Installation

Google Colab

Follow this example.

With GPU CUDA

Makesure you have installed CUDA and cuDNN
Install pytorch 2.2.x with CUDA support, you can follow this link, or you can just install the latest version

Clone this repo or download the zip file

git clone https://github.com/share424/Adaptive-Attention-Generation-for-Indonesian-Image-Captioning.git

Install the requirements

cd Adaptive-Attention-Generation-for-Indonesian-Image-Captioning
pip install .

Without GPU

Clone this repo or download the zip file

git clone https://github.com/share424/Adaptive-Attention-Generation-for-Indonesian-Image-Captioning.git

Install the requirements

cd Adaptive-Attention-Generation-for-Indonesian-Image-Captioning
pip install .

Predict a single image

We already provide a script to predict a single image, here's how to use it

Usage:
python scripts/predict_single.py <config> [--ckpt <checkpoint>] [--image <image>] [--search-strategy <strategy>] [--beam-size <size>] [--visualize] [--device <device>]

positional arguments:
  config                Path to the configuration file

options:
  -h, --help            show this help message and exit
  --ckpt CKPT           Path to the checkpoint file
  --image IMAGE         Path to the image file
  --search-strategy {greedy_search,beam_search}, -s {greedy_search,beam_search}
                        Search strategy to use
  --beam-size BEAM_SIZE
                        Beam size
  --visualize           Visualize the attention
  --device DEVICE       Device to use

Here is the example

Download and unzip the pretrained models from the Evaluation Results table

Run the following command

python scripts/predict_single.py resnet101-lstm.yaml --ckpt BEST_epoch_18_resnet101.pth -image images/sample-2.jpg

output

['kucing duduk di atas meja kayu']

If you found an error like this

FileNotFoundError: [Errno 2] No such file or directory: 'wordmap.json'

update the resnet101-lstm.yaml and edit the tokenizer.wordmap section to the absolute path of the wordmap.json file

Train the model

Before we start, you need to understand our training pipeline

Training Pipeline

Freeze encoder, and train decoder only for N - k epochs
Unfreeze encoder, and train both encoder and decoder for k epochs

Dataset Preparation

Create a config.yaml files, you can use the config/resnet101-lstm.yaml as a reference
Download the image dataset, you can use the Flickr30K or COCO2014 dataset
Download the translated Indonesia dataset here

Adjust the config.yaml file to the dataset path

data:
 train:
     - annotation: dataset/coco2014_indo_train.json
     image_dir: dataset/coco2014/train2014
     - annotation: dataset/flickr30k_indo_train.json
     image_dir: dataset/flickr30k_images
 validation:
     - annotation: dataset/flickr30k_indo_val.json
     image_dir: dataset/flickr30k_images
 test:
     - annotation: dataset/flickr30k_indo_test.json
       image_dir: dataset/flickr30k_images
     - annotation: dataset/coco2014_indo_val.json
       image_dir: dataset/coco2014/val2014

Create Wordmap

Wordmap is a dictionary that maps the word to the index, you can create the wordmap by running the following command

python scripts/create_wordmap.py dataset/coco2014_indo_train.json dataset/flickr30k_indo_train.json --output wordmap.json

note: use the train dataset to avoid the models see the test-set

You can add more dataset to the create_wordmap.py script, here's the usage

Usage:
    python scripts/create_wordmap.py <annotations>... [--output <output>]

Arguments:
    annotations (List[str]): List of paths to COCO annotations.
    output (str): Path to the output wordmap JSON file.

Example:
    python scripts/create_wordmap.py annotations.json --output wordmap.json

Note: Don't forget to add the wordmap path to the config.yaml file

tokenizer:
  wordmap: wordmap.json
  max_length: 50

Training Configuration

Edit the config.yaml file to adjust the training configuration

training:
  epochs: 20
  finetune_epochs: 5 # 5 last epoch is finetune encoder
  finetune_n_layer: 5 # num of last layer that will be finetune
  grad_clip: 10.0 # gradient clip value to avoid exploding gradient
  encoder_optimizer:
    name: Adam
    config:
      betas: [0.9, 0.999]
      lr: 0.0001
    lr_scheduler:
      name: ReduceLROnPlateau
      config:
        mode: max
        patience: 3
        verbose: True
        factor: 0.1

  decoder_optimizer:
    name: Adam
    config:
      betas: [0.8, 0.999]
      lr: 0.0005
    lr_scheduler:
      name: ReduceLROnPlateau
      config:
        mode: max
        patience: 3
        verbose: True
        factor: 0.1
  
  loss:
    name: CrossEntropyLoss
    config:
      reduction: mean

  checkpoint_dir: checkpoints
  track_metric: BLEU-1 # used by lr_scheduler and early stopping
  # available: [BLEU-1, BLEU-2, BLEU-3, BLEU-4, METEOR, ROUGE_L, CIDEr, loss, top5_accuracy]
  early_stop_n_epoch: 5

Training script

We already provide a script to train the model, here's how to use it

Usage:
    python scripts/train.py <config> [--device <device>] [--ckpt <checkpoint>]

Arguments:
    <config>                Path to the configuration file.
    --device <device>       Device to use. Default is 'cuda'.
    --ckpt <checkpoint>     Path to the checkpoint file. Optional.

Example:
    python scripts/train.py config.yaml --device cuda --ckpt checkpoint.pth

Here is the example

 python scripts/train.py config.yaml

the checkpoints will be saved in the checkpoints directory based on the config.yaml file

Evaluation

We already provide a script to evaluate the model, here's how to use it

Usage:
    python scripts/eval.py <config> [--split <split>] [--device <device>] [--ckpt <checkpoint>]

Arguments:
    config (str): Path to the configuration file.
    
Options:
    --split (str): Split to evaluate. Default is 'test'.
    --device (str): Device to use. Default is 'cuda'.
    --ckpt (str): Path to the checkpoint file.

Example:
    python scripts/eval.py config.yaml --split val --device cuda:0 --ckpt model.ckpt

Here is the example

python scripts/eval.py config.yaml --split test --device cuda:0 --ckpt model.ckpt

Acknowledgement

Citation

@INPROCEEDINGS{Sury2006:Adaptive,
    AUTHOR="Made {Surya Mahadi} and Anditya Arifianto and Kurniawan {Nur Ramadhani}",
    TITLE="Adaptive Attention Generation for Indonesian Image Captioning",
    BOOKTITLE="2020 8th International Conference on Information and Communication
    Technology (ICoICT) (ICoICT 2020)",
    ADDRESS="Yogyakarta, Indonesia",
    DAYS=23,
    MONTH=jun,
    YEAR=2020,
}

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
diagram		diagram
images		images
notebooks		notebooks
scripts		scripts
src/image_captioning		src/image_captioning
.gitignore		.gitignore
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Attention Generation for Indonesian Image Captioning

Changelog

Evaluation Results

Requirements

Installation

Google Colab

With GPU CUDA

Without GPU

Predict a single image

Train the model

Training Pipeline

Dataset Preparation

Create Wordmap

Training Configuration

Training script

Evaluation

Acknowledgement

Citation

License

About

Releases 2

Packages

Languages

share424/Adaptive-Attention-Generation-for-Indonesian-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Adaptive Attention Generation for Indonesian Image Captioning

Changelog

Evaluation Results

Requirements

Installation

Google Colab

With GPU CUDA

Without GPU

Predict a single image

Train the model

Training Pipeline

Dataset Preparation

Create Wordmap

Training Configuration

Training script

Evaluation

Acknowledgement

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages