International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 8 - Issue 8, August 2019 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Journey Of CFBA Variants With Advancement In Text-Mining And Subspace-Clustering

[Full Text]



Preeti Mulay, Rahul Raghvendra Joshi



Incremental-clustering, closeness, correlation, incremental-learning, distributed algorithms, sub-space clustering,CFBA



Many professional data-clustering algorithms in history and in use today have dependency on varied inputs from the user. Any wrong input by user may hamper the quality of clusters. With the advent of Internet-of-Things (IoT) in particular and Information-Technology in general, huge amount of data is getting produced in real time consistently. To handle such huge data, and to produce quality clusters iteratively, parameter-free incremental-clustering algorithm was a need of an hour. With this background the first Closeness-Factor-Based-Algorithm (CFBA) was in 2013 and evolved thereafter consistently. This paper is the amalgamation of all variants of CFBA, its progress, its relevance in the real world and the attempt to further propose few more new variants of CFBA in the fields of text-mining and sub-space clustering. The distributed versions of CFBA are successfully implemented using platforms like Azure, AWS and Map-Reduce, to name a few.



[1] Mulay, P., & Kulkarni, P. A. (2013). Knowledge augmentation via incremental clustering: new technology for effective knowledge management. International Journal of Business Information Systems, 12(1), 68-87.
[2] Kulkarni, P. A., & Mulay, P. (2013). Evolve systems using incremental clustering approach. Evolving Systems, 4(2), 71-85.
[3] Gaikwad, S. M., Joshi, R. R., & Mulay, P. (2015). Attribute visualization and cluster mapping with the help of new proposed algorithm and modified cluster formation algorithm to recommend an ice cream to the diabetic patient based on sugar contain in it. vol, 10, 1-6.
[4] Shinde, K., & Mulay, P. (2017, April). CBICA: Correlation based incremental clustering algorithm, a new approach. In 2017 2nd International Conference for Convergence in Technology (I2CT) (pp. 291-296). IEEE.
[5] Mulay, P., Joshi, R. R., Anguria, A. K., Gonsalves, A., Deepankar, D., & Ghosh, D. (2017). Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus. In Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications (pp. 27-33). Springer, Singapore.
[6] Mulay, P. (2016). Threshold computation to discover cluster structure: a new approach. International Journal of Electrical and Computer Engineering, 6(1), 275.
[7] Joshi, R. R., & Mulay, P. (2018). Deep Incremental Statistical Closeness Factor Based Algorithm (DIS-CFBA) to assess Diabetes Mellitus. BLOOD, 115, 210.
[8] Joshi, R. R., & Mulay, P. (2020). Closeness Factor Based Clustering Algorithm (CFBA) and allied implementations – Proposed IoMT Perspective. In A handbook of Internet of Things in Biomedical and Cyber Physical System (pp. 1-24). Springer, Nature (In Process).
[9] Mulay, P., Patel, K., & Gauchia, H. G. (2017). Distributed System Implementation Based on “Ants Feeding Birds” Algorithm: Electronics Transformation via Animals and Human. In Detecting and Mitigating Robotic Cyber Security Risks (pp. 51-85). IGI Global.
[10] Mulay, P., & Shinde, K. (2019). Personalized diabetes analysis using correlation-based incremental clustering algorithm. In Big Data Processing Using Spark in Cloud (pp. 167-193). Springer, Singapore.
[11] Weipeng Jing, Chuanyu Zhao,Chao Jiang, "An improvement method of DBSCAN algorithm on cloud computing", Vol 147, pp 596-604, 2019 Science Direct. https://doi.org/10.1016/j.procs.2019.01.208
[12] Prajesh P.A., "Improved MapReduce k-means clustering algorithm with combiner",IEEE Xplore 2015,DOI: 10.1109/UKSim.2014.11
[13] Mulay, P., Joshi, R. R., & Laddha, A. R. (2018). Diabetes Preventive Knowledge Management System for Recommending an Ice Cream to University Grads Based on Their Life Style and Eating Habits. In Big Data Management and the Internet of Things for Improved Health Systems (pp. 176-211). IGI Global.
[14] Krögerand P., Zimek A. (2009) Subspace Clustering Techniques. In: LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA.
[15] Yang J, Liang J, Wang K, Rosin P, Yang MH, "Subspace Clustering via Good Neighbors", IEEE Trans Pattern Anal Mach Intell. 2019 Apr 30. doi: 10.1109/TPAMI.2019.2913863. [Epub ahead of print]
[16] Arnab Ganguly, "Migrating big data workloads to Azure HDInsight", Posted on 1 May, 2019, https://azure.microsoft.com/en-in/blog/migrating-big-data-workloads-to-azure-hdinsight/
[17] Gitansh Chadha, Piali Das, and Zohar Karnin, "K-means clustering with Amazon SageMaker",on 08 NOV 2018, in Artificial Intelligence, SageMaker.
[18] Alexandre Verbitski, Anurag Gupta, DebanjanSaha,Murali Brahmadesam,Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao, "Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases", SIGMOD’17, May 14 –19, 2017, Chicago, IL, USA.ACM 978-1-4503-4197-4/17/05...$15.00,DOI: http://dx.doi.org/10.1145/3035918.3056101
[19] Radhika, Parameswari, D.V.Lalita, "Distributed Clustering for Big Data with MapReduce", pp 25-38, 2017, Vol. 19, DOI:10.9790/0661-1903032528, IOSR Journal of Computer Engineering.
[20] Natalia Ostapuk, Jie Yang, and Philippe Cudré-Mauroux. 2019, “ActiveLink:Deep Active Learning for Link Prediction in Knowledge Graphs”, In Proceedings of the 2019 World Wide Web Conference (WWW’19), May 13–17,2019, San Francisco, CA, USA.ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3308558.3313620
[21] Liang J, Yang J, Cheng MM, Rosin PL, Wang L,”Simultaneous Subspace Clustering and Cluster Number Estimating Based on Triplet Relationship”, . IEEE Trans Image Process. 2019 Aug; 28(8):3973-3985. Epub 2019 Mar 6.
[22] Lee J, Lee H, Lee M, Kwak N., “Nonparametric Estimation of Probabilistic Membership for Subspace Clustering”, IEEE Trans Cybern. 2018 Nov 8; Epub 2018 Nov 8.
[23] Chen X, Huang JZ, Wu Q, Yang M, "Subspace Weighting Co-Clustering of Gene Expression Data",IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):352-364. doi: 10.1109/TCBB.2017.2705686. Epub 2017 May 18.
[24] Yanyao Shen, Hyokun Yun, Zachary C Lipton, Yakov Kronrod, and AnimashreeAnandkumar. 2017. Deep Active Learning for Named Entity Recognition.arXivpreprint arXiv:1707.05928(2017).
[25] Agnew-Heard, K. A., Lancaster, V. A., Bravo, R., Watson, C. H., Walters, M. J., & Holman, M. R. (2016). Multivariate Statistical Analysis of Cigarette Design Features Influence on ISO TNCO Yields. Chemical Research in Toxicology.