Microarray Missing Data Imputation based on a Set Theoretic Framework and Biological Knowledge

File Size Format
42338.pdf 201Kb Adobe PDF View
Title Microarray Missing Data Imputation based on a Set Theoretic Framework and Biological Knowledge
Author Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong
Journal Name Nucleic Acids Research
Year Published 2006
Place of publication UK
Publisher Oxford University Press
Abstract Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.
Peer Reviewed Yes
Published Yes
Publisher URI http://nar.oxfordjournals.org/
Alternative URI http://dx.doi.org/10.1093/nar/gkl047
Copyright Statement Copyright 2006 Gan et al. This article has been published under an open access model.
Volume 34
Issue Number 5
Page from 1608
Page to 1619
ISSN 0305-1048
Date Accessioned 2007-04-27
Date Available 2009-10-16T05:20:25Z
Language en_AU
Research Centre Institute for Integrated and Intelligent Systems
Faculty Faculty of Engineering and Information Technology
Subject PRE2009-Gene Expression; PRE2009-Signal Processing
URI http://hdl.handle.net/10072/15357
Publication Type Journal Articles (Refereed Article)
Publication Type Code c1x

Brief Record

Griffith University copyright notice