DUET

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

Due to the page and format restrictions set by AAAI publications, we have omitted some details and appendix content. For the complete version of the paper, including the selection of prompts and experiment details, please refer to our arXiv version.

🔔 News

2024-02 We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo].
2023-12 Our paper: Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations was accepted by AAAI 2024

🤖 Model Architecture

📚 Dataset Download

The cache data for (CUB, AWA, SUN) are available here (Baidu cloud, 19.89G, Code: s07d).

📕 Code Path

Code Structures

There are four parts in the code.

model: It contains the main files for DUET network.
data: It contains the data splits for different datasets.
cache: It contains some cache files.
script: The training scripts for DUET.

DUET
├── cache
│   ├── AWA2
│   │   ├── attributeindex2prompt.json
│   │   └── id2imagepixel.pkl
│   ├── CUB
│   │   ├── attributeindex2prompt.json
│   │   ├── id2imagepixel.pkl
│   │   └── mapping.json
│   └── SUN
│   │   ├── attributeindex2prompt.json
│   │   ├── id2imagepixel.pkl
│   │   └── mapping.json
├── data
│   ├── AWA2
│   │   ├── APN.mat
│   │   ├── TransE_65000.mat
│   │   ├── att_splits.mat
│   │   ├── attri_groups_9.json
│   │   ├── kge_CH_AH_CA_60000.mat
│   │   └── res101.mat
│   ├── CUB
│   │   ├── APN.mat
│   │   ├── att_splits.mat
│   │   ├── attri_groups_8.json
│   │   ├── attri_groups_8_layer.json
│   │   └── res101.mat
│   └── SUN
│       ├── APN.mat
│       ├── att_splits.mat
│       ├── attri_groups_4.json
│       └── res101.mat
├── log
│   ├── AWA2
│   ├── CUB
│   └── SUN
├── model
│   ├── log.py
│   ├── main.py
│   ├── main_utils.py
│   ├── model_proto.py
│   ├── modeling_lxmert.py
│   ├── opt.py
│   ├── swin_modeling_bert.py
│   ├── util.py
│   └── visual_utils.py
├── out
│   ├── AWA2
│   ├── CUB
│   └── SUN
└── script
    ├── AWA2
    │   └── AWA2_GZSL.sh
    ├── CUB
    │   └── CUB_GZSL.sh
    └── SUN
        └── SUN_GZSL.sh

🔬 Dependencies

Python 3
PyTorch >= 1.8.0
Transformers>= 4.11.3
NumPy
All experiments are performed with one RTX 3090Ti GPU.

🎯 Prerequisites

Dataset: please download the dataset, i.e., CUB, AWA2, SUN, and change the opt.image_root to the dataset root path on your machine
- ❗NOTE: For other required feature files like APN.mat and id2imagepixel.pkl, please refer to here.
Data split: please download the data folder and place it in ./data/.
Attributeindex2prompt.json should generate and place it in ./cache/dataset/.
Download pretrained vision Transformer as the vision encoder:
- deit-base-distilled-patch16-224
- swin_base_patch4_window7_224.pth

🚀 Train & Eval

The training script for AWA2_GZSL:

bash script/AWA2/AWA2_GZSL.sh

Parameter

[--dataset {AWA2, SUN, CUB}] [--calibrated_stacking CALIBRATED_STACKING] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--classifier_lr LEARNING-RATE] [--xe XE] [--attri ATTRI] [--gzsl] [--patient PATIENT] [--model_name MODEL_NAME] [--mask_pro MASK-PRO] 
[--mask_loss_xishu MASK_LOSS_XISHU] [--xlayer_num XLAYER_NUM] [--construct_loss_weight CONSTRUCT_LOSS_WEIGHT] [--sc_loss SC_LOSS] [--mask_way MASK_WAY]
[--attribute_miss ATTRIBUTE_MISS]

📌 Note:

you can open the .sh file for parameter modification.
Don't worry if you have any question. Just feel free to let we know via Adding Issues.

🤝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/ChenHCGZFPC23,
  author       = {Zhuo Chen and
                  Yufeng Huang and
                  Jiaoyan Chen and
                  Yuxia Geng and
                  Wen Zhang and
                  Yin Fang and
                  Jeff Z. Pan and
                  Huajun Chen},
  title        = {{DUET:} Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning},
  booktitle    = {{AAAI}},
  pages        = {405--413},
  publisher    = {{AAAI} Press},
  year         = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
cache		cache
data		data
figure		figure
model		model
script		script
README.md		README.md
licence		licence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache

cache

data

data

figure

figure

model

model

script

script

README.md

README.md

licence

licence

Repository files navigation

DUET

🔔 News

🤖 Model Architecture

📚 Dataset Download

📕 Code Path

Code Structures

🔬 Dependencies

🎯 Prerequisites

🚀 Train & Eval

Parameter

🤝 Cite:

About

Releases

Packages

Contributors 5

Languages

License

zjukg/DUET

Folders and files

Latest commit

History

Repository files navigation

DUET

🔔 News

🤖 Model Architecture

📚 Dataset Download

📕 Code Path

Code Structures

🔬 Dependencies

🎯 Prerequisites

🚀 Train & Eval

🤝 Cite:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages