Speech-Signal-Based Frequency Warping

File Size Format
57723_1.pdf 279Kb Adobe PDF View
Title Speech-Signal-Based Frequency Warping
Author Paliwal, Kuldip Kumar; Shannon, Ben James; Lyons, James; Wojcicki, Kamil
Journal Name IEEE Signal Processing Letters
Year Published 2009
Place of publication United States
Publisher IEEE
Abstract The speech signal is used for transmission of linguistic information. High energy portions of the speech spectrum have higher signal-to-noise ratios than the low energy portions. As a result, these regions are more robust to noise. Since the speech signal is known to be very robust to noise, it is expected that the high energy regions of the speech spectrum carry the majority of the linguistic information. This letter tries to derive a frequency warping function directly from the speech signal by sampling the frequency axis nonuniformly with the high energy regions sampled more densely than the low energy regions. To achieve this, an ensemble average short-time power spectrum is computed from a large speech corpus. The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. The proposed frequency warping is shown to be similar to the frequency scales obtained through psycho-acoustic experiments, namely the mel and bark scales. The warping is then used in filterbank design for automatic speech recognition experiments. The results of these experiments show that cepstral features based on the proposed warping achieve performance under clean conditions comparable to that of mel-frequency cepstral coefficients, while outperforming them under noisy conditions.
Peer Reviewed Yes
Published Yes
Publisher URI http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=97
Alternative URI http://dx.doi.org/10.1109/LSP.2009.2014096
Copyright Statement Copyright 2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Volume 16
Issue Number 4
Page from 319
Page to 322
ISSN 1070-9908
Date Accessioned 2009-09-23
Language en_AU
Research Centre Institute for Integrated and Intelligent Systems
Faculty Faculty of Science, Environment, Engineering and Technology
Subject PRE2009-Signal Processing; PRE2009-Speech Recognition
URI http://hdl.handle.net/10072/25946
Publication Type Journal Articles (Refereed Article)
Publication Type Code c1

Show simple item record

Griffith University copyright notice