Attention-based End-to-End Neural Diarization

This repository comprises source code for two main research objectives

Combining various attention mechanisms to obtain a better model for two-speaker overlapping speech speaker diarization than the current state-of-the-art approaches.
The following combined attention mechanisms have been employed in the work. Combined as well as single attention mechanisms can be obtained by commenting the respective lines of code from pytorch_backend/models.py

Self Attention + Local Dense Synthesizer Attention (HA-EEND)
External Attention + Local Dense Synthesizer Attention
Relative Attention + Local Dense Synthesizer Attention

Experiments on the language dependency of EEND-based speaker diarization, and testing on combined datasets in both English and Sinhala languages

The repository largely references code from the following sources:

EEND by Research & Development Group, Hitachi, Ltd. who holds the copyright
EEND_PyTorch licensed under MIT License
External-Attention-pytorch licensed under MIT License
multihead-LDSA
attentions licensed under MIT License
ASR Recipes licensed under an Apache License, Version 2.0.

Directory Structure

├── egs : middle tier files                   
    ├── asr-sinhala/v1 : Modelling on Sinhala ASR and CALLSINHALA
        ├── conf : configuration files
        ├── local : locally used scripts and other files
        ├── cmd.sh : file that specifies job scheduling system
        ├── path.sh : path file
        ├── run.sh : train/infer/score model
        └── run_prepare_shared.sh : prepare data
    ├── callhome/v1 : CALLHOME test set
    ├── combined/v1 : Combined modelling on Sinhala ASR/LibriSpeech and test on CALLHOME
    └── librispeech/v1 : Modelling on LibriSpeech and CALLHOME
├── eend : backend files  
    └── pytorch_backend/models.py : specify different models to be trained on
└── tools : Kaldi setup

Installing requirements and Setting-up

The research was conducted in the following environment

OS : Ubuntu 18.04 LTS
Memory:
- For single multi-head layered encoder blocks: 8 CPUs, 32 GB RAM
- For double multi-head layered encoder blocks: 16 CPUs, 64 GB RAM
Storage : 150-200 GB

The following requirements are to be installed

Anaconda
CUDA Toolkit
SoX tool

Follow the following steps to install all the requirements and get going on the project.

1. Install Anaconda

sudo apt-get update 
sudo apt-get install bzip2 libxml2-dev -y 
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh (use Anaconda latest version)
bash Anaconda3-2020.11-Linux-x86_64.sh
rm Anaconda3-2020.11-Linux-x86_64.sh
source .bashrc

2. Install the required libraries

sudo apt install nvidia-cuda-toolkit -y
sudo apt-get install unzip gfortran python2.7 -y
sudo apt-get install automake autoconf sox libtool subversion -y
sudo apt-get update -y
sudo apt-get install -y flac

3. Clone the Git repository

git clone https://github.com/Sachini-Dissanayaka/HA-EEND.git

4. Install Kaldi and Python environment

cd HA-EEND/tools/ 
make

5. Install Pytorch

~/HA-EEND/tools/miniconda3/envs/eend/bin/pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

6. Add paths

export PYTHONPATH="${PYTHONPATH}:~/HA-EEND/"
export PATH=~/HA-EEND/tools/miniconda3/envs/eend/bin/:$PATH
export PATH=~/HA-EEND/eend/bin:~/HA-EEND/utils:$PATH
export KALDI_ROOT=~/HA-EEND/tools/kaldi
export PATH=~/HA-EEND/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sph2pipe_v2.5:$KALDI_ROOT/tools/sctk/bin:~/HA-EEND:$PATH

Configuration

Modify egs/librispeech/v1/cmd.sh according to your job schedular.

Data Preparation

The following datasets were used in the experiments.

Training
- LibriSpeech ASR corpus
- Sinhala ASR corpus
Testing
- CALLHOME portion of the 2000 NIST Speaker Recognition Evaluation Corpus
- CALLSINHALA dataset (collected by the authors)

For tests with English data:
Move the datasets (LibriSpeech and CALLHOME) into a folder with path egs/librispeech/v1/data/local
Run the following commands

cd egs/librispeech/v1
./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh

Reach us for any further clarifications

Yoshani Ranaweera : yoshani.ranaweera.17@cse.mrt.ac.lk
Sachini Dissanayaka : sachinidissanayaka.17@cse.mrt.ac.lk
Anjalee Sudasinghe : anjaleeps.17@cse.mrt.ac.lk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Attention-based End-to-End Neural Diarization

Directory Structure

Installing requirements and Setting-up

1. Install Anaconda

2. Install the required libraries

3. Clone the Git repository

4. Install Kaldi and Python environment

5. Install Pytorch

6. Add paths

Configuration

Data Preparation

Run training, inference, and scoring

Reach us for any further clarifications

Files

README.md

Latest commit

History

README.md

File metadata and controls

Attention-based End-to-End Neural Diarization

Directory Structure

Installing requirements and Setting-up

1. Install Anaconda

2. Install the required libraries

3. Clone the Git repository

4. Install Kaldi and Python environment

5. Install Pytorch

6. Add paths

Configuration

Data Preparation

Run training, inference, and scoring

Reach us for any further clarifications