%0 Journal Article
%D 2006
%T iVIBRATE: Interactive Visualization Based Framework for Clustering Large Datasets
%A Keke Chen
%A Ling Liu
%K . There are three known problems with this framework
%K especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling such particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additi
%K including the solutions for dealing with outliers
%K irregular clusters
%K sampling/summarization ¡ iterative cluster analysis ¡ disk-labeling'
%K there is an astounding growth in the amount of data produced and made available through the cyberspace. Efficient and high quality clustering of large datasets continues to be one of the most important problems in largescale data analysis. A commonly use
%K we describe iVIBRATE ¡ an interactive-visualization based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster rendering subsystem
%K which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters
%K which invites human into the large-scale iterative clustering process through interactive visualization
%K which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers
%K With continued advances in communication network technology and sensing technology
%X With continued advances in communication network technology and sensing technology, there is an astounding growth in the amount of data produced and made available through the cyberspace. Efficient and high quality clustering of large datasets continues to be one of the most important problems in largescale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of 'sampling/summarization ¡ iterative cluster analysis ¡ disk-labeling'. There are three known problems with this framework, which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling such particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third problem is the lack of research about the issues for effectively integrating the three phases. In this paper, we describe iVIBRATE ¡ an interactive-visualization based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster rendering subsystem, which invites human into the large-scale iterative clustering process through interactive visualization, and its Adaptive ClusterMap Labeling subsystem, which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of special issues presented in integrating the two components and the sampling approach into a coherent framework, and the solutions to improve the reliability of the framework and to minimize the amount of errors generated throughout the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and experimentally evaluate the iVIBRATE approach using both real-life datasets and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user into the clustering process and generate high-quality clustering results for large datasets.
%G eng