International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 6 - Issue 6, June 2017 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

A Framework To Support Management Of HIV/AIDS Using K-Means And Random Forest Algorithm

[Full Text]



Gladys Iseu, Waweru Mwangi, Dr. Michael Kimwele



Clustering, Classification, K-Means, Random Forest , Data Mining, Big Data



Healthcare industry generates large amounts of complex data about patients, hospital resources, disease management, electronic patient records, and medical devices among others. The availability of these huge amounts of medical data creates a need for powerful mining tools to support health care professionals in diagnosis, treatment and management of HIV/AIDS. Several data mining techniques have been used in management of different data sets. Data mining techniques have been categorized into regression algorithms, segmentation algorithms, association algorithms, sequence analysis algorithms and classification algorithms. In the medical field, there has not been a specific study that has incorporated two or more data mining algorithms hence limiting decision making levels by medical practitioners. This study identified the extent to which K-means algorithm cluster patient characteristics; it has also evaluated the extent to which random forest algorithm can classify the data for informed decision making as well as design a framework to support medical decision making in the treatment of HIV/AIDS related diseases in Kenya. The paper further used random forest classification algorithm to compute proximities between pairs of cases that can be used in clustering, locating outliers or (by scaling) to give interesting views of the data.



[1] Fathima, S and Sheriff, A. (2012). Exploring Support Vector Machines and Random Forests for the Prognostic Study of an Arboviral Disease. International Journal of Computer Applications 57(9):6-10, November 2012.

[2] Hatamlou, A., 2013. Black hole: A new heuristic optimization approach for data clustering. Information Sciences 222, 175-184

[3] Goebel, V (2015). Knowledge Discovery in Databases (KDD) - Data Mining (DM) Department of Informatics, University of Oslo

[4] Chunfei Zhang, Zhiyi Fang (2013) “An Improved K-means Clustering Algorithm”, Journal of Information & Computational Science, Volume 10, No. 1, 2013, pp :193-199.

[5] WHO (2013) Consolidated guidelines on the use of antiretroviral drugs for treating and preventing HIV infection; HIV/AIDS Department 20, Avenue Appia CH-1211 Geneva 27 Switzerland

[6] Ndavi, P.M., S. Ogola, P.M. Kizito, and K. Johnson. 2009. Decentralizing Kenya’s Health Management System: An Evaluation. Kenya Working Papers No. 1. Calverton, Maryland, USA: Macro International Inc.

[7] Hodge G, Flower R, Han P.(1999) Effect of factor VIII concentrate on leucocyte cytokine production: characterization of TGF-beta as an immuno modulatory component in plasma-derived factor VIII concentrate. Sep; 106(3):784-91

[8] Letouz, E (2011). Big data for development: Challenges and Opportunities: Global Pulse, New York

[9] Khamis, H, Cheruiyot,K and Kimani, S (2014) Application of k- Nearest Neighbour Classification in Medical Data Mining; International Journal of Information and Communication Technology Research, Volume 4 No. 4, April 2014 ISSN 2223-4985

[10] Singla, J, Grover, D and Bhandari, A (2014). Medical Expert Systems for Diagnosis of Various Diseases, International Journal of Computer Applications (0975–8887) Volume 93–No.7, May 2014

[11] Rosenblatt,F (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–407, 1958 (Reprinted in Neurocomputing (MIT Press, 1988),

[12] Rajinikanth T, Balaram, V and Rajasekhar, N (2014) “Dhinaharan Nagamalai, Analysis of Indian Weather Data Sets using Data Mining Techniques", Dhinaharan Nagamalai et al. (Eds) : ACITY, WiMoN, CSIA, AIAA, DPPR, NECO, Volume 1, Issue 2, 2014, pp. 89–94.

[13] Khalilia, M, Chakraborty, S and Popescu, M "Predicting Disease Risks from Highly Unbalanced Data using Random Forest", BMC Medical Informatics and Decision Making, 2011, 11:51.

[14] Berry, M. and Linoff, G. (2000) Mastering Data Mining. John Wiley & Sons, Inc., NewYork

[15] Wakoli,L, Orto, A and Mageto, S (2014). Application of The K-Means Clustering Algorithm In Medical Claims Fraud / Abuse Detection, " Application of The K-Means Clustering Algorithm In Medical Claims Fraud / Abuse Detection" , International Journal of Application or Innovation in Engineering & Management (IJAIEM) , Volume 3, Issue 7, July 2014 , pp. 142-151 , ISSN 2319 - 4847.

[16] Shams,I, Ajorlou, S and Yang, K (2013). A predictive analytics approach to reducing avoidable hospital readmission Department of Industrial and Systems Engineering, Wayne State University , Detroit, MI

[17] Shanu, S, Newman, S and Marquardt, J (2014). Population Cost Prediction on Public Healthcare Datasets, http://www.commonwealthfund.org/publications/fund- reports/2014/jun/mirror-mirror

[18] Jones, Z and Linder. F (2015). Exploratory Data Analysis using Random Forests, Big Data Social Science”

[19] Ferńandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014).Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15:3133–3181

[20] Breiman, L. (2001). "Statistical Modeling: the Two Cultures". Statistical Science 16 (3): 199–215. doi:10.1214/ss/1009213725

[21] Mingers,J (1989) . An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3,319-342

[22] Shwaran, H, Kogalur, B, Blackstone, H, and Lauer, S (2008) Random Survival Forest, Cleverend Clinic, Columbia University Vol.2 No. 3, 841-860

[23] Hothorn,T, Hornik, K & Zeileis, A (2006) Unbiased Recursive Partitioning: A Conditional Inference Framework, Journal of Computational and Graphical Statistics Volume 15, 2006 - Issue 3

[24] Kuncheva L.I., D.P. Vetrov, Evaluation of stability of k-means cluster ensembles with respect to random initialization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (11), 2006, 1798-1808

[25] Gruvaeus, G. and Wainer, H. (1972).Two additions to hierarchical cluster analysis. British Journal of Mathematical and Statistical Psychology, 25, 200-206.

[26] Ling, R.F (1973). A probability theory of cluster analysis, Annals of the American Statistical Association 68,159-164

[27] Wigton RS, Connor JL, Centor RM (1986) Transportability of a decision rule for the diagnosis of streptococcal pharyngitis. Arch Intern Med. 1986 Jan; 146(1):81-3.

[28] Breiman, L. (2001a). Random Forest, Machine Learning, 45(1): 5-32

[29] Strobl C., Malley J., Tutz G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 14, 323–348 10.1037