This paper introduces an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge to enhance multi-modal structured representations.
2024-02
We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo
].2023-12
Our paper: Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations was accepted byAAAI 2024
2022-12
We release the [Repo] for ourAAAI 2023
paper: DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Training datasets are available here
.
There are four parts in the code.
- model: It contains the main files for Structure-CLIP network.
- data: It contains the pre-training data splits and downstream dataset.
- checkpoints: It saves checkpoint for reloading.
- script: The training scripts for Structure-CLIP.
Python 3
PyTorch >= 1.8.0
Transformers>= 4.11.3
NumPy
- All experiments are performed with one A100 GPU.
The training script:
bash script/run.sh
[--train_path TRAIN_PATH] [--test_path TEST_PATH] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--lr LEARNING-RATE] [--weight_decay WEIGHT_DECAY] [--knowledge_weight KNOWLEDGE_WEIGHT] [--transformer_layer_num NUMBER] [--model_name MODEL_NAME] [--neg_loss_weight NEG_LOSS_WEIGHT]
Note:
- you can open the
.sh
file for parameter modification.
Please consider citing this paper if you use the code
or data
from our work.
Thanks a lot :)
@inproceedings{DBLP:conf/aaai/StructureCLIP,
author = {Yufeng Huang and
Jiji Tang and
Zhuo Chen and
Rongsheng Zhang and
Xinfeng Zhang and
Weijie Chen and
Zeng Zhao and
Zhou Zhao and
Tangjie Lv and
Zhipeng Hu and
Wen Zhang},
title = {Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations},
booktitle = {{AAAI}},
publisher = {{AAAI} Press},
year = {2024}
}