A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis
P.Balaji, Dr. A.P.Siva Kumar
Index Terms: Clustering, Categorical data, Gene data, DNA, Ensemble Approach.
Abstract: It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset, for a defined set of gene expression data, because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms, this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics, and relations within the ensemble are encapsulated only at raw level, while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.
 P.J. Rousseeuw and L. Kaufman, Finding Groups in Data: Introduction to Cluster Analysis. Wiley Publishers, 1990.
 J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
 J.C. Gower, “A General Coefficient of Similarity and Some of ItsProperties,” Biometrics, vol. 27, pp. 857-871, 1971.
 D. Cristofor and D. Simovici, “Finding Median Partitions UsingInformation-Theoretical-Based Genetic Algorithms,” J. UniversalComputer Science, vol. 8, no. 2, pp. 153-172, 2002.
 A. Strehl and J. Ghosh, “Cluster Ensembles: A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
 S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust ClusteringAlgorithm for Categorical Attributes,” Information Systems, vol. 25, no. 5, pp. 345-366, 2000.
 G. Karypis and V. Kumar, “Multilevel K-Way Partitioning Scheme for Irregular Graphs,” J. Parallel Distributed Computing, vol. 48, no. 1, pp. 96-129, 1998.
 C. Domeniconi and M. Al-Razgan, “Weighted Cluster Ensembles: Methods and Analysis,” ACM Trans. Knowledge Discovery from Data, vol. 2, no. 4, pp. 1-40, 2009.
 X.Z. Fern and C.E. Brodley, “Solving Cluster Ensemble Problems by Bipartite Graph Partitioning,” Proc. Int’l Conf. Machine Learning (ICML), pp. 36-43, 2004.
 D. Gibson, J. Kleinberg, and P. Raghavan, “Clustering Catego-rical Data: An Approach Based on Dynamical Systems,” VLDB J.,vol. 8, nos. 3-4, pp. 222-236, 2000.
 A.L.N. Fred and A.K. Jain, “Combining Multiple ClusteringsUsing Evidence Accumulation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005.
 M.J. Zaki and M. Peters, “Clicks: Mining Subspace Clusters inCategorical Data via Kpartite Maximal Cliques,” Proc. Int’l Conf.Data Eng. (ICDE), pp. 355-356, 2005.
 D. Liben-Nowell and J. Kleinberg, “The Link-Prediction Problem for Social Networks,” J. Am. Soc. for Information Science and Technology, vol. 58, no. 7, pp. 1019-1031, 2007.
 A.K. Jain and R.C. Dubes, Algorithms for Clustering. Prentice-Hall, 1998.
 T. Boongoen, Q. Shen, and C. Price, “Disclosing False Identity through Hybrid Link Analysis,” artificial Intelligence and Law, vol. 18, no. 1, pp. 77-102, 2010.