Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition
There are no files associated with this record.
| Title | Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition |
|---|---|
| Author | Shannon, Ben James; Paliwal, Kuldip Kumar |
| Journal Name | Speech Communication |
| Year Published | 2006 |
| Place of publication | Netherlands |
| Publisher | Elsevier BV |
| Abstract | In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower-time lags, while the higher-lag autocorrelation coefficients are least affected, this method discards the lower-lag autocorrelation coefficients and uses only the higher-lag autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-lag autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further (like the well-known Mel frequency cepstral coefficient (MFCC) procedure) by the Mel filter bank, log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the autocorrelation Mel frequency cepstral coefficients (AMFCCs). We evaluate the speech recognition performance of the AMFCC features on the Aurora and the resource management databases and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech. Finally, we show that the AMFCC features perform better than the features derived from the robust linear prediction-based methods for noisy speech. |
| Peer Reviewed | Yes |
| Published | Yes |
| Publisher URI | http://www.elsevier.com/wps/find/journaldescription.cws_home/505597/description#description |
| Alternative URI | http://dx.doi.org/10.1016/j.specom.2006.08.003 |
| Volume | 48 |
| Page from | 1458 |
| Page to | 1485 |
| ISSN | 0167-6393 |
| Date Accessioned | 2007-03-18 |
| Date Available | 2009-09-21T05:50:11Z |
| Language | en_AU |
| Research Centre | Institute for Integrated and Intelligent Systems |
| Faculty | Faculty of Science, Environment, Engineering and Technology |
| Subject | PRE2009-Speech Recognition |
| URI | http://hdl.handle.net/10072/14344 |
| Publication Type | Journal Articles (Refereed Article) |
| Publication Type Code | c1 |
Please use this identifier to cite this record: http://hdl.handle.net/10072/14344
Griffith University copyright notice
Copyright in individual works within the repository belongs to their authors or publishers. You may make a print or digital copy of a work for your personal non-commercial use. All other rights are reserved, except for fair dealings or other user rights granted by the copyright laws of your country.
Back to top