Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Paper Link: Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Highlights

To answer an unexplored research question: "Do we need to normalize the soft prompts in VLMs?", we first uncover a phenomenon, called the Low-Norm Effect by performing extensive corruption experiments, suggesting that reducing the norms of certain learned prompts occasionally enhances the performance of VLMs, while increasing them often degrades it. To harness this effect, we propose a novel method named Normalizing the soft-prompt vectors of vision-language models (Nemesis) to normalize soft-prompt vectors in VLMs. To the best of our knowledge, our work is the first to systematically investigate the role of norms of soft-prompt vector in VLMs, offering valuable insights for future research in soft-prompt tuning.

Besides, we also conduct preliminary experiments to verify the generalizability and effectiveness of Nemesis on other Parameter-EFficient Tuning (PEFT) methods, including visual prompt tuning and prefix-tuning. Detailed results can be found in the following tables.

The Low-Norm Effect

The schematic diagram of the Low-Norm Effect

A schematic diagram of the Low-Norm Effect: the reduction of norms at specific positions within these prompts enhances performance, whereas an increase in norms typically results in performance deterioration. Top: corrupted soft prompts with increased norms leading to decreased performance; Middle: soft prompts learned by CoOp; Bottom: corrupted soft prompts with reduced norms resulting in enhanced performance.

The frequency of the Low-Norm Effect across 11 datasets

The 11 datasets have exhibited varying frequencies of the Low-Norm Effect. This observation indicates that tackling the Low-Norm Effect is a challenging task, given its inconsistent manifestation across the 11 datasets.

From the left figure, it is apparent that the norms of soft prompts in CoOp first increase and then level off, while test accuracy falls into degradation as norms slowly flatten out. By performing corruption operations that decrease the norms of prompt vectors, the last green circle may be pushed away from the degradation area and get closer to those small green circles that demonstrate superior performance.

From the right figure, we observe a distinct norm variation pattern in CoOp+Nemesis (ours) that differs from CoOp. This pattern demonstrates an initial increase in norms, followed by a subsequent decrease, and eventually reaching a stable state. Furthermore, the test accuracy exhibits a consistent upward trend before reaching a plateau, whereas a declining trend is observed in CoOp.

This implies that our method can delay the time point where soft prompts tend to plateau during the learning process, thereby reducing the probability of learning degradation.

Main Results

Few-shot classification

Domain generalization

Base-to-new generalization

How to Run

First, you should follow the instructions in DATASETS.md to download datasets.

Next, you should follow the intructions in Dassl.pytorch to install the dassl environment.

Finally, we provide the running scripts in ./scripts, which allow you to reproduce the results.

Make sure you change the path in bash file (/path/to/dataset) and run the commands under different paths, including coop, coop_crt (coop+corruption), coop_nemesis, plot, plot_nemesis.

The running commands of few-shot learning, domain generalization, base-to-new tasks can refer to COOP.md.

Here, we provide examples on how to conduct corruption experiments based on CoOp (./scripts/coop_crt/eval_loop.sh):

Corruption Experiments

# original
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 50 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 2 False 50 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep100 end 16 4 False 100 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep100 end 16 8 False 100 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50 end 16 16 False 200 original 666 666
# replace
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 0 0.
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 1 0.
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 2 0.
# rescale
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 0 0.001
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 1 0.001
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 2 0.001

P.S. the last two parameters represent the corrupt position and the corruption weight in corruption experiments, respectively. Hence, they can be set as any number like 666 in the original evaluation since they are not used in this experiment.

Citation

If you use this code in your research, please kindly cite the following paper:

@inproceedings{nemesis,
    title={Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models},
    author={Shuai Fu, Xiequn Wang, Qiushi Huang and Yu Zhang},
    booktitle={The International Conference on Learning Representations (ICLR)},
    year={2024}
}

Acknowledgements

Our code is based on CoOp. We thank the authors for releasing their code. If you use our model and code, please consider citing this work as well.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
clip		clip
clip_plot		clip_plot
configs		configs
datasets		datasets
figures		figures
scripts		scripts
trainers		trainers
COOP.md		COOP.md
DATASETS.md		DATASETS.md
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip

clip

clip_plot

clip_plot

configs

configs

datasets

datasets

figures

figures

scripts

scripts

trainers

trainers

COOP.md

COOP.md

DATASETS.md

DATASETS.md

README.md

README.md

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Highlights

The Low-Norm Effect

Main Results

How to Run

Corruption Experiments

Citation

Acknowledgements

About

Releases

Packages

Languages

ShyFoo/Nemesis

Folders and files

Latest commit

History

Repository files navigation

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Highlights

The Low-Norm Effect

Main Results

How to Run

Corruption Experiments

Citation

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages