TY - JOUR

T1 - A pairwise-gaussian-merging approach

T2 - Towards genome segmentation for copy number analysis

AU - Chen, Chih Hao

AU - Lee, Hsing Chung

AU - Ling, Qingdong

AU - Chen, Hsiao Jung

AU - Wang, Sun Chong

AU - Wu, Li Ching

AU - Lee, H. C.

PY - 2011/3

Y1 - 2011/3

N2 - Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N 2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory - O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds.

AB - Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N 2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory - O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds.

KW - Cancer

KW - Chromosomal aberration

KW - Copy number variation

KW - Pathogenesis

KW - Segmentation analysis

UR - http://www.scopus.com/inward/record.url?scp=79953663711&partnerID=8YFLogxK

M3 - 期刊論文

AN - SCOPUS:79953663711

SN - 2010-376X

VL - 75

SP - 58

EP - 66

JO - World Academy of Science, Engineering and Technology

JF - World Academy of Science, Engineering and Technology

ER -