Robust Speech Recognition in Noisy Environments Based on Subband Spectral Centroid Histograms
| File | Size | Format | |
|---|---|---|---|
| 41666_1.pdf | 605Kb | Adobe PDF | View |
| Title | Robust Speech Recognition in Noisy Environments Based on Subband Spectral Centroid Histograms |
|---|---|
| Author | Gajic, Bojana; Paliwal, Kuldip Kumar |
| Journal Name | IEEE Transactions on Audio, Speech and Language Processing |
| Year Published | 2006 |
| Place of publication | USA |
| Publisher | IEEE |
| Abstract | We investigate how dominant-frequency information can be used in speech feature extraction to increase the robustness of automatic speech recognition against additive background noise. First, we review several earlier proposed auditory-based feature extraction methods and argue that the use of dominant-frequency information might be one of the major reasons for their improved noise robustness. Furthermore, we propose a new feature extraction method, which combines subband power information with dominant subband frequency information in a simple and computationally efficient way. The proposed features are shown to be considerably more robust against additive background noise than standard mel-frequency cepstrum coefficients on two different recognition tasks. The performance improvement increased as we moved from a small-vocabulary isolated-word task to a medium-vocabulary continuous-speech task, where the proposed features also outperformed a computationally expensive auditory-based method. The greatest improvement was obtained for noise types characterized by a relatively flat spectral density. |
| Peer Reviewed | Yes |
| Published | Yes |
| Publisher URI | http://ieeexplore.ieee.org/servlet/opac?punumber=10376 |
| Alternative URI | http://dx.doi.org/10.1109/TSA.2005.855834 |
| Copyright Statement | Copyright 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. |
| Volume | 14 |
| Issue Number | 2 |
| Page from | 600 |
| Page to | 608 |
| ISSN | 1558-7916 |
| Date Accessioned | 2007-03-18 |
| Date Available | 2009-09-21T05:50:12Z |
| Language | en_AU |
| Research Centre | Institute for Integrated and Intelligent Systems |
| Faculty | Faculty of Science, Environment, Engineering and Technology |
| Subject | PRE2009-Speech Recognition |
| URI | http://hdl.handle.net/10072/14345 |
| Publication Type | Journal Articles (Refereed Article) |
| Publication Type Code | c1 |
Please use this identifier to cite this record: http://hdl.handle.net/10072/14345
Griffith University copyright notice
Copyright in individual works within the repository belongs to their authors or publishers. You may make a print or digital copy of a work for your personal non-commercial use. All other rights are reserved, except for fair dealings or other user rights granted by the copyright laws of your country.
Back to top