A Novel Robust Mel-Energy Based Voice Activity Detector for Nonstationary Noise and Its Application for Speech Waveform Compression
Master of Science in Electrical Engineering (MSEE)
Electrical and Computer Engineering
The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this thesis, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple existing representative schemes, namely the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noise, namely white noise (stationary), babble noise (nonstationary) and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating curves (ROCs). Finally, we demonstrate one of the major applications, namely speech waveform compression, associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency.
Document Availability at the Time of Submission
Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.
Waheeduddin, Syed, Q., "A Novel Robust Mel-Energy Based Voice Activity Detector for Nonstationary Noise and Its Application for Speech Waveform Compression" (2006). LSU Master's Theses. 1855.