Improved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies
Abstract
In this paper, we examine a coding scheme for quantising feature vectors in a distributed speech recognition environment that is more robust to noise. It consists of a vector quantiser that operates on the logarithmic filterbank energies (LFBEs). Through the use of a perceptually-weighted Euclidean distance measure, which emphasises the LFBEs that represent the spectral peaks, the vector quantiser codebook provides /emph{a priori} knowledge of the spectral characteristics of clean speech and is used to quantise features from noise-corrupted speech. Our comparative results from the ETSI Aurora-2 recognition task show that ...
View more >In this paper, we examine a coding scheme for quantising feature vectors in a distributed speech recognition environment that is more robust to noise. It consists of a vector quantiser that operates on the logarithmic filterbank energies (LFBEs). Through the use of a perceptually-weighted Euclidean distance measure, which emphasises the LFBEs that represent the spectral peaks, the vector quantiser codebook provides /emph{a priori} knowledge of the spectral characteristics of clean speech and is used to quantise features from noise-corrupted speech. Our comparative results from the ETSI Aurora-2 recognition task show that the perceptually-weighted vector quantisation of LFBEs achieves higher recognition accuracies for noisy speech than the unweighted vector quantisation, memoryless and multi-frame GMM-based block quantisation and scalar quantisation of Mel frequency-warped cepstral coefficients.
View less >
View more >In this paper, we examine a coding scheme for quantising feature vectors in a distributed speech recognition environment that is more robust to noise. It consists of a vector quantiser that operates on the logarithmic filterbank energies (LFBEs). Through the use of a perceptually-weighted Euclidean distance measure, which emphasises the LFBEs that represent the spectral peaks, the vector quantiser codebook provides /emph{a priori} knowledge of the spectral characteristics of clean speech and is used to quantise features from noise-corrupted speech. Our comparative results from the ETSI Aurora-2 recognition task show that the perceptually-weighted vector quantisation of LFBEs achieves higher recognition accuracies for noisy speech than the unweighted vector quantisation, memoryless and multi-frame GMM-based block quantisation and scalar quantisation of Mel frequency-warped cepstral coefficients.
View less >
Conference Title
9th European Conference on Speech Communication and Technology