An Automatic Lipreading System for Spoken Digits With Limited Training Data
| File | Size | Format | |
|---|---|---|---|
| 53752_1.pdf | 210Kb | Adobe PDF | View |
| Title | An Automatic Lipreading System for Spoken Digits With Limited Training Data |
|---|---|
| Author | Wang, S. L.; Liew, Alan Wee-Chung; Lau, W. H.; Leung, S. H. |
| Journal Name | I E E E Transactions on Circuits and Systems for Video Technology |
| Editor | Keshab K Parhi |
| Year Published | 2008 |
| Place of publication | United States |
| Publisher | I E E E |
| Abstract | It is well known that visual cues of lip movement contain important speech relevant information. This paper presents an automatic lipreading system for small vocabulary speech recognition tasks. Using the lip segmentation and modeling techniques we developed earlier, we obtain a visual feature vector composed of outer and inner mouth features from the lip image sequence for recognition. A spline representation is employed to transform the discrete-time sampled features from the video frames into the continuous domain. The spline coefficients in the same word class are constrained to have similar expression and are estimated from the training data by the EM algorithm. For the multiple-speaker/speaker-independent recognition task, an adaptive multimodel approach is proposed to handle the variations caused by various talking styles. After building the appropriate word models from the spline coefficients, a maximum likelihood classification approach is taken for the recognition. Lip image sequences of English digits from 0 to 9 have been collected for the recognition test. Two widely used classification methods, HMM and RDA, have been adopted for comparison and the results demonstrate that the proposed algorithm deliver the best performance among these methods. |
| Peer Reviewed | Yes |
| Published | Yes |
| Alternative URI | http://dx.doi.org/10.1109/TCSVT.2008.2004924 |
| Copyright Statement | Copyright 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
| Volume | 18 |
| Issue Number | 12 |
| Page from | 1760 |
| Page to | 1765 |
| ISSN | 1051-8215 |
| Date Accessioned | 2009-02-28 |
| Date Available | 2011-10-18T07:26:36Z |
| Language | en_AU |
| Research Centre | Institute for Integrated and Intelligent Systems |
| Faculty | Faculty of Science, Environment, Engineering and Technology |
| Subject | Computer Vision; Pattern Recognition and Data Mining |
| URI | http://hdl.handle.net/10072/23604 |
| Publication Type | Journal Articles (Refereed Article) |
| Publication Type Code | c1 |
Please use this identifier to cite this record: http://hdl.handle.net/10072/23604
Griffith University copyright notice
Copyright in individual works within the repository belongs to their authors or publishers. You may make a print or digital copy of a work for your personal non-commercial use. All other rights are reserved, except for fair dealings or other user rights granted by the copyright laws of your country.
Back to top