Scalable distributed speech recognition using Gaussian mixture model-based block quantization

There are no files associated with this record.

Title Scalable distributed speech recognition using Gaussian mixture model-based block quantization
Author So, Stephen; Paliwal, Kuldip Kumar
Journal Name Speech Communication
Editor M.G.J. Swerts
Year Published 2006
Place of publication Netherlands
Publisher Elsevier BV
Abstract In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen-Loeve transform of the block quantiser. Compared with vector quantisers, the GMM-based block quantiser has relatively low computational and memory requirements which are independent of bitrate. More importantly, it is bitrate scalable, which means that the bitrate can be adjusted without the need for re-training. Static parameters such as the GMM and transform matrices are stored at the encoder and decoder and bit allocations are calculated "on-the-fly" without intensive processing. We have evaluated the quantisation scheme on the Aurora-2 database in a DSR framework. We show that jointly quantising more frames and using more mixture components in the GMM leads to higher recognition performance. The multi-frame GMM-based block quantiser achieves a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline (unquantised) word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps.
Peer Reviewed Yes
Published Yes
Publisher URI http://www.elsevier.com/wps/find/journaldescription.cws_home/505597/description#description
Alternative URI http://dx.doi.org/10.1016/j.specom.2005.10.002
Copyright Statement Copyright 2006 Elsevier. Please refer to the journal's website for access to the definitive, published version.
Volume 48
Issue Number 6
Page from 746
Page to 758
ISSN 0167-6393
Date Accessioned 2007-02-13
Date Available 2009-09-21T05:50:51Z
Language en_AU
Research Centre Institute for Integrated and Intelligent Systems
Faculty Faculty of Science, Environment, Engineering and Technology
Subject PRE2009-Signal Processing; PRE2009-Speech Recognition
URI http://hdl.handle.net/10072/14445
Publication Type Journal Articles (Refereed Article)
Publication Type Code c1

Show simple item record

Griffith University copyright notice