Skip to content

showlab/CLVQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLVQA

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (AAAI2023)

[arXiv | Data & annotation(json/npy)]

CLVQA


Preparation

Installation

conda create -n mmclvqa python=3.8
conda activate mmclvqa

git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .

cd ..
pip install -r extra_requirements.txt

CLOVE Dataset and Annotation

We release the datasets and annotations in json format(link) and npy format(link). To use our code for training, please download the npy files.

  • Example of data sample:
    { 
    'answer': 'kiosk',                                         # answer
    'answers': ['kiosk','kiosk',...],                          # answer in VQAv2 format, repeat 10 times if there is only one answer in the annotation
    'feature_path': '440.npy',                                 # feature path to retrieve features
    'gqa_question':                                            # GQA annotations, if applicable
                    { 'annotations': { 'answer': {},
                                        'fullAnswer': {},
                                        'question': {}},
                        'answer': 'kiosk',
                        'entailed': ['06778810', '06778808'],
                        'equivalent': ['06778808', '06778809'],
                        'fullAnswer': 'It is a kiosk.',
                        'groups': {'global': 'place', 'local': '02q-place'},
                        'imageId': '440',
                        'isBalanced': True,
                        'question': 'What place is this?',
                        'semantic': [ { 'argument': 'scene',
                                        'dependencies': [],
                                        'operation': 'select'},
                                    { 'argument': 'place',
                                        'dependencies': [0],
                                        'operation': 'query'}],
                        'semanticStr': 'select: scene->query: place [0]',
                        'types': { 'detailed': 'place',
                                'semantic': 'global',
                                'structural': 'query'}},
    'gt_scene_graph_mask': [1,0,0,0 ..., ],                  # Ground-truth SG mask for question answer generation corresponding to `gt_scene_graph_seq`. 1 represents the SG relation is related to the question-answer generation.           
    'gt_scene_graph_seq': [                                   # Ground-truth SG annotated for the image in this annotation datum.
        'kiosk [SEP]', 'counter [SEP]', 'lady [SEP]', 'trash can [SEP]', ...
        ],
    'image_id': '440',                                        # image id
    'image_source': 'vg',                                     # image source
    'ocr': [],                                                # ocr info in the image, applicable in textvqa
    'ocr_info': [],                                           # ocr info in the image, applicable in textvqa
    'ocr_tokens': [],                                         # ocr tokens, applicable in text vqa
    'pred_scene_graph_seq': [                                 # predicted SG extracted by an off-the-shelf model
                                'building behind man [SEP]',
                                'building behind woman [SEP]',
                                'man watching man [SEP]',
                                'person watching man [SEP]',
                                'building behind woman [SEP]',
                                ...
                            ],
    'program': [                                              # program excuted to generate question
                {'argument': 'scene', 'dependencies': [], 'operation': 'select'},
                { 'argument': 'place',
                    'dependencies': [0],
                    'operation': 'query'}
                ],
    'question': 'What place is this?',                        # question
    'question_id': 'g06778809',                               # question id
    'raw_question_type': {                                    # raw question type, applicable in original GQA annotation
                            'detailed': 'place',
                            'semantic': 'global',
                            'structural': 'query'
                            },
    'set_name': 'train',                                      # set name: train/val
    'stage': 'object',                                        # stage name for continual learning
    'supporting_fact': []                                     # supporting facts, applicable in stage "knowledge"
    }

Training

Symbolic Replay Model (SRM)

Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,

cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
 --cl_setting scene \
 --task_seq abcdef \
 --model_name distilgpt2 \
 --model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token  \
 --add_task_tokens \
 --n_train_epochs 15

# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root  /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token  \ --add_task_tokens \
--n_train_epochs 15
  • We release our replayed samples for 6 task orders as reported in the paper.
  • For the 6 tasks orders, you can inspect via these files: scene / function or refer to our paper.

UniVQA

Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py). Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples = $1.5:1$, with task order $o \rightarrow a \rightarrow r \rightarrow l \rightarrow k \rightarrow s$:

ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_knowledge_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[k].npy,oarlks_REPLAY[a]_AT[k].npy,oarlks_REPLAY[r]_AT[k].npy,oarlks_REPLAY[l]_AT[k].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 



if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext/unicl_final.pth" ] ; then 
 CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_scenetext_unicl_standalone.yaml \
 model=unicl \
 dataset=clvqa \
 training.CL.use_cl=True \
 training.CL.use_callback=False \
 training.CL.use_replay=True \
 training.CL.replay_method=restore_with_prob \
 training.CL.task_order=oarlks \
 training.CL.restore_rate=1.5 \
 training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
 training.CL.restore_paths=oarlks_REPLAY[o]_AT[s].npy,oarlks_REPLAY[a]_AT[s].npy,oarlks_REPLAY[r]_AT[s].npy,oarlks_REPLAY[l]_AT[s].npy,oarlks_REPLAY[k]_AT[s].npy \
 dataset_config.clvqa.use_mask_img=True \
 dataset_config.clvqa.mask_img_prob=0.15 \
 run_type=train_val \
 checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth \
 env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext \
 training.checkpoint_interval=4000 \
 training.callbacks=[] 
fi 

Testing

One can follow generate_run_scripts.py to generate one stop training-and-testing. For testing only, please refer to eval_os.py. An testing example for function setting, 1.5x SRM replayed samples, task order oarlks.

python
>>> from eval_os import *
>>> stage_sweep(cl_setting='functional', setting_idx=1, abbr_seq='oarlks', device=0, model_name='unicl', save_dir='/Users/stan/exp/clvqa', val_exp='distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5', test_stand_alone=False, test_reg=False, print_acc=False)
{'textvqa_accuracy': {'a2a': 0.4177,
                      'a2k': 0.1967,
                      'a2l': 0.0563,
                      'a2o': 0.4037,
                      'a2r': 0.121,
                      'a2s': 0.1453,
                      'k2a': 0.3263,
                      'k2k': 0.6813,
                      'k2l': 0.6807,
                      'k2o': 0.295,
                      'k2r': 0.3167,
                      'k2s': 0.1501,
                      'l2a': 0.272,
                      'l2k': 0.1943,
                      'l2l': 0.7153,
                      'l2o': 0.2653,
                      'l2r': 0.307,
                      'l2s': 0.1408,
                      'o2a': 0.1013,
                      'o2k': 0.1063,
                      'o2l': 0.0197,
                      'o2o': 0.5997,
                      'o2r': 0.0823,
                      'o2s': 0.0962,
                      'r2a': 0.3713,
                      'r2k': 0.2073,
                      'r2l': 0.121,
                      'r2o': 0.4023,
                      'r2r': 0.3943,
                      'r2s': 0.1555,
                      's2a': 0.3083,
                      's2k': 0.6253,
                      's2l': 0.6733,
                      's2o': 0.2963,
                      's2r': 0.3037,
                      's2s': 0.5511}}
{'textvqa_accuracy': [('o2o', 0.5997),
                      ('o2a', 0.1013),
                      ('o2r', 0.0823),
                      ('o2l', 0.0197),
                      ('o2k', 0.1063),
                      ('o2s', 0.0962),
                      ('a2o', 0.4037),
                      ('a2a', 0.4177),
                      ('a2r', 0.121),
                      ('a2l', 0.0563),
                      ('a2k', 0.1967),
                      ('a2s', 0.1453),
                      ('r2o', 0.4023),
                      ('r2a', 0.3713),
                      ('r2r', 0.3943),
                      ('r2l', 0.121),
                      ('r2k', 0.2073),
                      ('r2s', 0.1555),
                      ('l2o', 0.2653),
                      ('l2a', 0.272),
                      ('l2r', 0.307),
                      ('l2l', 0.7153),
                      ('l2k', 0.1943),
                      ('l2s', 0.1408),
                      ('k2o', 0.295),
                      ('k2a', 0.3263),
                      ('k2r', 0.3167),
                      ('k2l', 0.6807),
                      ('k2k', 0.6813),
                      ('k2s', 0.1501),
                      ('s2o', 0.2963),
                      ('s2a', 0.3083),
                      ('s2r', 0.3037),
                      ('s2l', 0.6733),
                      ('s2k', 0.6253),
                      ('s2s', 0.5511)]}
==> textvqa_accuracy | Final acc: [0.2963, 0.3083, 0.3037, 0.6733, 0.6253, 0.5511], weight avg acc: 0.45966666666666667. 
==> textvqa_accuracy | Backward transfer: [-0.3034, -0.1094, -0.09059999999999996, -0.04200000000000004, -0.05600000000000005], weighted bwt: -0.12028000000000004 
==> textvqa_accuracy | Forgetting: [0.41900000000000004, 0.40700000000000003, 0.4116, 0.04200000000000004, 0.09000000000000008], weighted forgetting: 0.27392.

Misc

config

  • For each .yaml config file under mmclvqa/EXP_CONFIG, change the path of annotations to where you put your annotation files. E.g.,

    annotations:
        train:
        - /your_path_to/fcl_mmf_attribute_train.npy
        val:
        - /your_path_to/fcl_mmf_attribute_val.npy
        test:
        - /your_path_to/fcl_mmf_attribute_val.npy
  • For each .yaml config file under mmclvqa/EXP_CONFIG, change the path of vocab_file to where you put your vocab_files(use the copy under files). E.g.,

    text_processor:
        type: bert_tokenizer
        params:
            max_length: 20 # change from 14 to 20
            vocab:
            type: intersected
            embedding_name: glove.6B.300d
            vocab_file: /your_path_to/vocabulary_100k.txt
    ###
    scene_graph_processor:
        type: scene_graph_bert_tokenizer
        params:
            max_length: 480
            vocab:
            type: intersected
            embedding_name: glove.6B.300d
            vocab_file: /your_path_to/vocabulary_100k.txt
    ###
    answer_processor:
        type: m4c_answer
        params:
            vocab_file: /your_path_to/clvqa_answer_6k.txt
  • Modify paths in mmclvqa/mmf/common/CL_constant.py:

    DATA_DIR = dict(        # modify path
        functional = "path to folder of function annotations",
        scene = "path to folder of scene annotations",
    )
    
    # These files are under files/
    GENERATED_SG_PTH = dict(
        functional = "/your_path_to/generated_sg_all_stages_v6.json", # modify path here
        scene = "/your_path_to/stage_sg_scene_setting_50u-50c.json",  # modify path here
    )
  • For each .yaml config file under mmclvqa/EXP_CONFIG, you may change the cache_dir where the program would save the automatically downloaded features.

    env:
        cache_dir: /workspace/stan/.cache/torch/mmf
  • Path for SRM replayed samples. When training SRM, you may specify --model_dir_root [model_dir_root], the replayed samples will be saved under [model_dir_root]/[model_name]_replay/[model_name]_[setting_name]_[task_order]/(automatically set to be used at training.CL.restore_dir for UniVQA CL training).

  • You may change the training batch size for UniVQA by passing training.batch_size=xxx.


Cite Our Work

If you find our work helps, please cite our paper.

@article{Lei_symbolic_2023, 
title={Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task}, 
volume={37}, 
url={https://ojs.aaai.org/index.php/AAAI/article/view/25208}, 
DOI={10.1609/aaai.v37i1.25208}, 
number={1}, 
journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
author={Lei, Stan Weixian and Gao, Difei and Wu, Jay Zhangjie and Wang, Yuxuan and Liu, Wei and Zhang, Mengmi and Shou, Mike Zheng}, 
year={2023}, 
month={Jun.}, 
pages={1250-1259} 
}

Contact

For any questions, welcome to create an issue or email Stan ([email protected]).


Acknowledgement

  • This codebase is based on MMF and LAMOL -- we thank the authors for their amazing works.
  • We'd like to thank Yujun Shi for his valuable discussion.