A study on Speaker Recognition System
Loading...
Date
2015-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The British University in Dubai (BUiD)
Abstract
"The huge development in information technology opened the door for finding an increasing number of security gaps in the daily used systems like email accounts. Security systems developers and manufacturers are trying hardly to cope with the increasing security breaching attacks. The need to overcome this challenge forced many researchers and manufacturers to think about adding extra levels of security to protect information and resources; these extra levels of security are mainly involve around using the human biometrics in order to identify the real identity of the user. Speaker recognition methods are considered a leading approach in applying biometric security systems.
In this thesis we aimed to develop a unique speaker recognition system with a user friendly interface. The proposed system was mainly developed using Python (Python.org, 2015). This system was used to implement and study several methods and techniques in speaker recognition domain.
Another main goal for conducting this research is to make a scientific comparison between tools and methods that are related to speaker recognition domain, the following are the techniques that were studied : 1) Energy based tool and Long-Term Spectral Divergence (LTSD) in the preprocessing module of the system, 2) Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) in the feature extraction module, and 3) scikit-learn Gaussian Mixture Model (GMM), Universal Background Model (UBM), Continuous Restricted Boltzmann Machine (CRBM) and Joint Factor Analysis (JFA) in the recognition module. Finally, we proposed a new GMM in this thesis which was compared with the famous scikit-learn GMM technique.
All the mentioned tools and methods were tested and experimented in this thesis. Findings of the experiments showed that: 1) LTSD for voice activity detection is faster and more practical than the energy based tools, 2) MFCC is computationally more expensive than LPCC but MFCC is faster and more accurate, also LPCC needs double size utterance to achieve the same accuracy MFCC generates. 3) The new GMM showed that it is five times faster than scikit-learn GMM, also the proposed GMM outperforms all other techniques studied in this thesis. As a result, to build a user-friendly speaker recognition system, it is better to use LSTD for preprocessing, MFCC for feature extraction, and our enhanced GMM for speaker testing and recognition."
Description
Keywords
information technology, speaker recognition system, security systems, human biometrics