Scalable distributed speech recognition using multi-frame GMM-based block quantization
There are no files associated with this record.
| Title | Scalable distributed speech recognition using multi-frame GMM-based block quantization |
|---|---|
| Author | Paliwal, Kuldip Kumar; So, Stephen |
| Publication Title | Interspeech 2004 (ICSLP) |
| Editor | Soon Hyob Kim and Dae Hee Youn |
| Year Published | 2004 |
| Place of publication | Korea |
| Publisher | Sunjin Printing Co. |
| Abstract | In this paper, we propose the use of the multi-frame Gaussian mixture model-based block quantizer for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding scheme exploits intraframe correlation via the Karhunen-Loeve transform (KLT) and interframe correlation via the joint processing of adjacent frames together with the computational simplicity of scalar quantization. The proposed coder is bit-rate scalable, which means that the bitrate can be adjusted without the need for re-training of the quantizers. Static parameters such as the probability density function (PDF) model and KLT orthogonal matrices are stored at the encoder and decoder and bit allocations are calculated 'on-the-fly' without intensive processing. This coding scheme is evaluated in this paper on the Aurora-2 database in a DSR framework. It is shown that this coding scheme achieves high recognition performance at lower bitrates, with a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps. |
| Peer Reviewed | Yes |
| Published | Yes |
| Publisher URI | http://www.isca-speech.org/index.php |
| Alternative URI | http://www.isca-speech.org/archive/interspeech_2004/ |
| ISBN | 1225-441X |
| Conference name | 8th International Conference on Spoken Language Processing (ICSLP-2004) |
| Location | Jeju, Korea |
| Date From | 2004-10-04 |
| Date To | 2004-10-08 |
| URI | http://hdl.handle.net/10072/2117 |
| Date Accessioned | 2005-03-31 |
| Date Available | 2009-09-22T05:48:56Z |
| Language | en_AU |
| Research Centre | Institute for Integrated and Intelligent Systems |
| Faculty | Faculty of Engineering and Information Technology |
| Subject | PRE2009-Signal Processing; PRE2009-Speech Recognition |
| Publication Type | Conference Publications (Full Written Paper - Refereed) |
| Publication Type Code | e1 |
Please use this identifier to cite this record: http://hdl.handle.net/10072/2117
Griffith University copyright notice
Copyright in individual works within the repository belongs to their authors or publishers. You may make a print or digital copy of a work for your personal non-commercial use. All other rights are reserved, except for fair dealings or other user rights granted by the copyright laws of your country.
Back to top