An efficient distributed hierarchical-clustering algorithm for large scale data

Cheng Hsien Tang, An Ching Huang, Meng Feng Tsai, Wei Jen Wang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

3 引文 斯高帕斯(Scopus)


The data-classification process can possibly involve a huge amount of data in today's cloud computing environment. It could take a long time for processing, and could consume many resources for computation and storage. This study focuses on the problem of using the traditional hierarchical agglomerative clustering algorithm on a distributed environment since hierarchical agglomerative clustering has high applicability and efficiency. A parallel hierarchical agglomerative clustering algorithm is proposed in this study. The proposed algorithm divides the whole computation into several small tasks, distribute the tasks to message-passing processes, and merge the results to form a hierarchical cluster. A threshold is used to reduce the storage requirement during the computation. To evaluate the performance and limitation of our algorithm, this study has conducted several experiments using real astronomical data, the main asteroid belt catalog. The experimental results confirm that the proposed parallel algorithm is efficient.

主出版物標題ICS 2010 - International Computer Symposium
出版狀態已出版 - 2010
事件2010 International Computer Symposium, ICS 2010 - Tainan, Taiwan
持續時間: 16 12月 201018 12月 2010


名字ICS 2010 - International Computer Symposium


???event.eventtypes.event.conference???2010 International Computer Symposium, ICS 2010


深入研究「An efficient distributed hierarchical-clustering algorithm for large scale data」主題。共同形成了獨特的指紋。