TY - JOUR
T1 - A pairwise-gaussian-merging approach
T2 - Towards genome segmentation for copy number analysis
AU - Chen, Chih Hao
AU - Lee, Hsing Chung
AU - Ling, Qingdong
AU - Chen, Hsiao Jung
AU - Wang, Sun Chong
AU - Wu, Li Ching
AU - Lee, H. C.
PY - 2011/3
Y1 - 2011/3
N2 - Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N 2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory - O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds.
AB - Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N 2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory - O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds.
KW - Cancer
KW - Chromosomal aberration
KW - Copy number variation
KW - Pathogenesis
KW - Segmentation analysis
UR - http://www.scopus.com/inward/record.url?scp=79953663711&partnerID=8YFLogxK
M3 - 期刊論文
AN - SCOPUS:79953663711
SN - 2010-376X
VL - 75
SP - 58
EP - 66
JO - World Academy of Science, Engineering and Technology
JF - World Academy of Science, Engineering and Technology
ER -