Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Extracted_Feats		Extracted_Feats
Output_Logs		Output_Logs
Processed_Data		Processed_Data
Raw_Data		Raw_Data
VQ_codebooks		VQ_codebooks
.gitignore		.gitignore
DTW.py		DTW.py
Live_Test.ipynb		Live_Test.ipynb
README.md		README.md
Report.pdf		Report.pdf
Speech_Endpointing.ipynb		Speech_Endpointing.ipynb
VQ.py		VQ.py
VQ_check.py		VQ_check.py
bag_of_frames.py		bag_of_frames.py
live_endpointing.py		live_endpointing.py
mfcc_feat.py		mfcc_feat.py

Repository files navigation

Digit-Speech-Recognition

For technical details, please see the Report

File Description and Usage

Files

bag_of_frames.py: Code for N-fold CV for bag of frames method. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
DTW.py: Code for N-fold CV for DTW method. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
live_endpointing.py: Code for real time endpointing used for interactive demo
Live_Test.ipynb: IPython notebook to test the code in an interactive setting. Requires VQ codebook to be generated from each speakers data using k means. This was tested for 8 clusters, however accuracy may improve if the number of clusters are increased.
mfcc_feat.py: Code to get the MFCC feature vector for the wav file input, obtained after endpointing in ./Processed_Data. Stores the MFCC in ./Extracted_Feats/ directory
Report.pdf: Report made for the project, describing things in more detail and also includes observations
Speech_Endpointing.ipynb: Used to endpoint the speech signals. Warning ! Different thresholds maybe required for different speakers. Splits the input waveform into smaller waveforms which are free of noise, in ./Processed_Data directory.
VQ.py: Used to generate the files in ./VQ_codebooks/ directory. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
VQ_check.py: Code for n CV validation of VQ method. Requires ./VQ_codebooks to be populated appropriately.

Directories

Extracted_Feats: Stores the extracted MFCC feature vectors for each 64 utterance of each digit. See the directory structure to get more idea on storing part
Output_Logs: Has 3 text files storing the output logs of N-CV of Bag of Frames, VQ and DTW methods
Processed_Data: Has 64 wav files corresponding to every digit, obtained after end-pointing appropriately
Raw_Data: Has the zip files for the given data of 16 speakers
VQ_codebook: Has the VQ codebooks generated for each speaker by VQ.py

About

Speech Recognition: Digits

python dtw speech-recognition mfcc

Report repository

Releases

No releases published

Packages

No packages published

Languages