International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 9 - Issue 1, January 2020 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Analytics For Healthcare Using Hadoop Mapreduce, Apache Spark And In Cloud Services

[Full Text]



Dr.K.Sharmila, Dr.T,Kamalakannan



AWS, Big data, Cloud computing, Diabetic Mellitus , Hadoop MapReduce, K-means, SVM algorithm, spark.



Decision making and knowledge discovery from voluminous big data is a challenging problem. Extracting useful information from the enormous amount of data is highly complex, difficult and time consuming. Therefore standard data mining algorithms are essential for the analysis of big data with different platform. This investigation focuses on benchmarking of parallel processing platforms and Cloud computing environment. Cloud computing facility has emerged as service oriented computing model to deliver infrastructure, platform and applications as services from the providers to the consumers. This study utilized the services provided by Amazon Web Services as an effective metaphor for the management of large scale data processing in elastically scalable computing and for storage. This paper also discusses about the framework of MapReduce integrated with K-means and SVM machine learning techniqes algorithm on standalone environment and spark to predict the diabetic related diseases from real-time data set collected in various districts of Tamil Nadu. Ultimately, the present study has established that parallelization using Apache Hadoop with spark shows a better performance compared with a standalone model in a single machine. With the expansion of Information and communication technology, the health care industry also is producing extensively large data day by day. In developing countries like India, the accumulation of data is large and there exist various problems. This type of Big Data analysis will hopefully help the diabetes patients and physicians to predict the disease and to treat them at an early.



[1] Inmon W. H., Building the Data Warehouse, 3rd edition, John Wiley & Sons, 2002.
[2] Wei Fan and Albert Bifet. "Mining Big Data: Current Status, and Forecast to the Future", SIGKDD Explorations. 14(2).
[3] Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money. "Big Data: Issues and Challenges Moving Forward", 46th Hawaii International Conference on (pp. 995-1004).IEEE, 2013.
[4] Kiran kumara Reddi & Dnvsl Indira “Different Technique to Transfer Big Data :survey” IEEE Transactions. 52(8):2013.
[5] Wullianallur Raghupathi, Viju Raghupathi. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(3): 2-10. 2014.
[6] Huang T, L. Lan, X. Fang, P. An, J. Min, and F. Wang, "Promises and challenges of big data computing in health sciences '', Big Data Res., vol. 2, no. 1, pp. 2-11, 2015.
[7] Ashwin Belle, Raghuram Thiagarajan, S. M. Reza Soroushmehr Fatemeh Navidi, Daniel A. Beard, and Kayvan Najarian, Big Data Analytics in Healthcare BioMed Research International Volume 2015, Article ID 370194.
[8] Aditya B. Patel, Manashvi , Birla, Ushma Nair. Addressing big data problem using Hadoop and Map Reduce, http://ieeexplore.ieee.org/document/6493198/
[9] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. Spark: Cluster Computing with Working Sets.HotCloud 2010. June 2010.
[10] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012. April 2012.
[11] Spark MLib, Apache Spark performance,https://spark.apache.org/mllib/.
[12] Sanjay P. Ahuja1, Sindhu Mani1 & Jesus Zambrano1, A Survey of the State of Cloud Computing in Healthcare, Network and Communication Technologies. Canadian Centre of Science and Education.1(2): 12-19, 2012.
[13] Mukaka. M, "A guide to appropriate use of correlation coefficient in medical research,'' Malawi Med. J., vol. 24, no. 3, pp. 69-71, 2012.
[14] Prajesh P Anchalia, Anjan K Koundinya, Shrinath N K. "MapReduce Design of K-means Clustering Algorithm", IEEE, 2013.
[15] Burbidge, R.Trotter, M. Buxton B. and Holden, S. “Drug design by machine learning: support vector machines for pharmaceutical data analysis”, Computers and Chemistry, 26,5-14. 2001.
[16] K. Sharmila, S. Kamalakkannan, R. Devi, C. Shanthi, " Big Data Analysis using Apache Hadoop and Spark", International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8 Issue-2, July 2019.
[17] Este, A.Gringoli F.and Salgarelli, L.“Support Vector Machines for TCP traffic classification”, Computer Networks, 53, 2476-2490. 2009.
[18] L Qu J.and M. J. Zuo, “Support vector machine based data processing algorithm for wear degree classification of slurry pump systems”, Measurement, 43, 781-791. 2010.p:
[19] Abdullah A. Aljumah, Mohammed Gulam Ahamad, Mohammad Khubeb Siddiqui, “Application of data mining: Diabetes health care in young and old patients” in Journal of King Saud University – Computer and Information Sciences, 25:127-136.2013.
[20] M.G. Jaatun, G. Zhao, and C. Rong (Eds.) Parallel K-means Clustering Based on MapReduce―.:Cloud COM. LNCS, 674–679, 2009.
[21] Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money. "Big Data: Issues and Challenges Moving Forward", 46th Hawaii International Conference on (pp. 995-1004).IEEE, 2013.