02389nas a2200205 4500008004100000245006700041210006600108520109700174653001601271653006301287653018101350653001701531653026001548653015401808653012101962653003802083100001402121700001502135856003302150 2008 eng d00aBest K: the Critical Clustering Structures in Categorical Data0 aBest K the Critical Clustering Structures in Categorical Data3 aThe demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been developed, surprisingly, none has satisfactorily addressed the problem of Best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure, traditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper, we study the entropy property between the clustering results of categorical data with different K number of clusters, and propose the BKPlot method to address the three important cluster validation problems: 1) How can we determine whether there is significant clustering structure in a categorical dataset? 2) If there is significant clustering structure, what is the set of candidate 'best Ks'? 3) If the dataset is large, how can we efficiently and reliably determine the best Ks?10abest Ks'10ahow can we efficiently and reliably determine the best Ks?10anone has satisfactorily addressed the problem of Best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure10asurprisingly10aThe demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been develope10atraditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper10awe study the entropy property between the clustering results of categorical data with different K number of clusters10awhat is the set of candidate '1 aLiu, Ling1 aChen, Keke uhttp://knoesis.org/node/1478