Skip to content

[NDSS'24] Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Notifications You must be signed in to change notification settings

LetterLiGo/Inaudible-Adversarial-Perturbation-Vrifle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

This is the core implementation for (VRifle) "Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time", in Proceedings of Network and Distributed System Security 2024 Symposium (NDSS 2024).

We would like to thank the author of deepspeech2-pytorch-adversarial-attack for providing an excellent foundation for our code, which targets the DeepSpeech2 model.

We also extend our gratitude to the contributors of deepspeech.pytorch for developing an easy-to-use DeepSpeech framework.

Citation

If you think this repo helps you, please consider cite in the following format.

@inproceedings{li2024vrifle,
  title={Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time},
  author={Li, Xinfeng and Yan, Chen and Lu, Xuancun and Zeng, Zihan and Ji, Xiaoyu and Xu, Wenyuan},
  booktitle={In the 31st Annual Network and Distributed System Security Symposium (NDSS)},
  year={2024}
}

Get Start

Several dependencies required to be installed first. Please follow the instruction in DeepSpeech 2 PyTorch to build up the environments.
It is recommended to setup your folders of DeepSpeech 2 PyTorch in the following structure.

ROOT_FOLDER/
├── this_repo/
│   ├──main_vrifle.py
│   └──...
├──deepspeech.pytorch/
│   ├──models/
│   │   └──librispeech/
│   │       └──librispeech_pretrained_v2.pth
│   └──...

Then, you should download the DeepSpeech pretrained model from this link provided by the DeepSpeech 2 PyTorch

Introduction

Deep Speech 2[1] is a state-of-the-art Automatic Speech Recognition (ASR) system, notable for its end-to-end training capability where spectrograms are directly utilized to generate predicted sentences.

In this work, we implement the first trial of completely inaudible (ultrasonic) adversarial perturbation attacks against this ASR system. In this way, the classical PGD (Projected Gradient Descent) algorithm can also render an efficient optimization.

[1] Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182).

Preparation

  1. Download the Fluent Speech Command Dataset
  2. If you want to speed up the optimization on 3090 GPU. Turn to Support DeepSpeech on 3090 GPUs (NVIDIA)

Usage

It is easy to perturb the original raw wave file to generate desired sentence with main_vrifle.py.

python main_vrifle.py --attack_type Mute_robust --device 0

python main_vrifle.py --attack_type Universal_robust --device 0

Actually, several parameters are available to make your adversarial attack better. You may tune hypyerparameters such as epsilon, alpha, and PGD_iter to adjusted for better results. For the details, please refer to main_vrifle.py and vrifle_attack.py.

Support DeepSpeech on 3090 GPUs (NVIDIA)

Through our numerous attempts and extensive research, we have established the following setup details :)

Install Deepspeech.pytorch

  1. Download deepspeech.pytorch
  2. cd into the folder and then pip install -r requirements.txt
  3. pip install -e . # Dev install
  4. pip install adversarial-robustness-toolbox[pytorch]
  5. pip install torchaudio
  6. git clone https://github.com/SeanNaren/warp-ctc.git
  7. You should replace the #include <THC/THC.h>extern THCState* state, which refers to https://blog.csdn.net/weixin_41868417/article/details/123819183修改binding.cpp`

6. Install Warp-CTC

  • edit the CMakeLists.txt
# Before replacement
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_35,code=sm_35")

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_50,code=sm_50")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_52,code=sm_52")

# After
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_86,code=sm_86")
  • Compilation
cd warp-ctc
mkdir build
cd build
cmake ..
make
cd ../pytorch_binding
  • Modifying binding.cpp
## replace

#include <THC/THC.h>
extern THCState* state; 
void* gpu_workspace = THCudaMalloc(state, gpu_size_bytes);

## into
void* gpu_workspace = c10::cuda::CUDACachingAllocator::raw_alloc(gpu_size_bytes);


## replace
THCudaFree(state, (void *) gpu_workspace);
## into
c10::cuda::CUDACachingAllocator::raw_delete((void *) gpu_workspace);
  • the last step
python setup.py install
  • You should notice that the --recursive is required for a workable CTCdecode dependency
git clone --recursive [email protected]:parlance/ctcdecode.git

About

[NDSS'24] Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages