IJSTR

International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
0.2
2019CiteScore
 
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020

CALL FOR PAPERS
AUTHORS
DOWNLOADS
CONTACT

IJSTR >> Volume 9 - Issue 8, August 2020 Edition



International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616



OPTIMIZATION FOR PREDICTING MISSING DATA IN DATABASE TRANSFER PROCESSING

[Full Text]

 

AUTHOR(S)

SUMITRA NUANMEESRI

 

KEYWORDS

CROSS-VALIDATION, MISSING DATA, OPTIMIZATION, RANDOM FOREST, RESAMPLE, SMOTE

 

ABSTRACT

THE OBJECTIVE OF THE ARTICLE IS TO OPTIMIZING DATA FOR PREDICTING AND FILLING THE MISSING DATA IN THE PROCESS OF DATABASE TRANSFER FROM SEVERAL DATABASES TO A CENTRAL DATABASE OR THE NEW DATABASE SYSTEM. THE RESEARCH RESULT SHOWS THAT THE RESAMPLE TECHNIQUE CAN IMPROVE THE DATASET FROM 3,190 TO 29,800 RECORDS, WHILE THE SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE GAINS THE DATASET UP TO16,563 RECORDS, WHICH GENERATED AT 1000% OF THE ORIGINAL DATASET. WHEN CREATING A MODEL TO PREDICTING THE MISSING DATA IN DATABASE TRANSFER PROCESS WITH THE RANDOM FOREST TECHNIQUE, IT WAS FOUND THAT THE EFFICIENCY OF THE MODEL EVALUATION BY USING THE 10-FOLD CROSS-VALIDATION METHOD GAVE THE MODEL ACCURACY OF THE SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE THAT APPROACH TO THE HIGHER THAN RESAMPLE METHOD IN EVERY DATA RANGE. IT WILL BE ABLE TO CLASSIFY THE DATA TO REPRESENT THE MISSING DATA DURING THE DATABASE TRANSFER PROCESS WITH MORE THAN 96% EFFICIENCY.

 

REFERENCES

[1] G. SSALI AND T. MARWALA, “ESTIMATION OF MISSING DATA USING COMPUTATIONAL INTELLIGENCE AND DECISION TREES,” PROCEEDINGS OF IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, HONG KONG, 2007. DOI.10.1109/IJCNN.2008.4633790
[2] M. A. DEVI, S. RAVI, J. VAISHNAVI AND S. PUNITHA, “CLASSIFICATION OF CERVICAL CANCER USING ARTIFICIAL NEURAL NETWORKS,” PROCEDIA COMPUTER SCIENCE, VOL. 89, PP. 465–472, 2016. DOI.ORG/10.1016/J.PROCS.2016.06.105
[3] G. HAIXIANG, L. YIJING, J. SHANG, G. MINGYUN, H. YUANYUE AND G. BING “LEARNING FROM CLASS-IMBALANCED DATA: REVIEW OF METHODS AND APPLICATION,” EXPERT SYSTEMS WITH APPLICATIONS, VOL. 73, PP.220–239, 2017. DOI: 10.1016/J.ESWA.2016.12.035
[4] G. LAHERA, “UNBALANCED DATASETS & WHAT TO DO ABOUT THEM,” HTTPS://MEDIUM.COM/STRANDS-TECH-CORNER/UNBALANCED-DATASETS-WHAT-TO-DO-144E0552D9CD, 2019
[5] M. SOKOLOVA AND G. LAPALME, “A SYSTEMATIC ANALYSIS OF PERFORMANCE MEASURES FOR CLASSIFICATION TASKS,” INFORMATION PROCESSING AND MANAGEMENT, VOL. 45, PP. 427–437, 2009 DOI. 10.1016/J.IPM.2009.03.002, 2009.
[6] S. RISI AND K. O. STANLEY, “AN ENHANCED HYPERCUBE-BASED ENCODING FOR EVOLVING THE PLANCEMENT, DENSITY AND CONNECTIVITY OF NEURONS,” ARTIFICAIL LIFE, 2012.
[7] Y. CHARFAOUI, “RESAMPLING TO PROPERLY HANDLE IMBALANCED DATASETS IN MACHINE LEARNING,” HTTPS://HEARTBEAT.FRITZ.AI/RESAMPLING-TO-PROPERLY-HANDLE-IMBALANCED-DATASETS-IN-MACHINE-LEARNING-64D82C16CEAA, 2019.
[8] S. RISI, “TOWARDS EVOLVING MORE BRAIN-LIKE ARTIFICIAL NEURAL NETWORKS,” PH.D. THESIS, DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, UNIVERSITY OF CENTRAL FLORIDA, 2012.
[9] X. YAO, “EVOLVING ARTIFICIAL NEURAL NETWORKS,” PROCEEDING OF THE IEEE 87: PIEEE, PP. 1423–1447, 1999.
[10] G. M. WEISS AND F. PROVOST, “LEARNING WHEN TRAINING DATA ARE COSTLY: THE EFFECT OF CLASS DISTRIBUTION ON TREE INDUCTION,” JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, VOL. 19, PP. 315–354. DOI.10.1613/JAIR.1199, 2003.
I. ALBISUAL, O. ARBELAITZ, I. GURRUTXAGA, J. I. MARTIN, J. MUGUERZA, J. M. PEREZ AND I. PERONA, “OBTAINING OPTIMAL CLASS DISTRIBUTION FOR DECISION TREES: COMPARATIVE ANALYSIS OF CTC AND C4.5,” CONFERENCE OF CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 13TH CONFERENCE OF THE SPANISH ASSOCIATION FOR ARTIFICIAL INTELLIGENCE CAEPIA 2009, SEVILLE, SPAIN, NOVEMBER 9-13, 2009, PP. 349-358, 2009. DOI.ORG.10.1007/978-3-642-14264-2_11
[11] M. SOKOLOVA AND G. LAPALME, “A SYSTEMATIC ANALYSIS OF PERFORMANCE MEASURES FOR CLASSIFICATION TASKS,” INFORMATION PROCESSING AND MANAGEMENT, VOL. 45, PP. 427–437, 2009. DOI. 10.1016/J.IPM.2009.03.002
[12] H. XIONG AND W. B. LEE, “KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT,” PROCEEDINGS OF 5TH INTERNATIONAL CONFERENCE KSEM, PP. 344–352, 2019. DOI. 10.1007/978-3-642-25975-3
[13] N. CHAWLA, K. BOWYER, L. HALL AND W. KEGELMEYER, “SMOTE: SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE,” JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, VOL. 16, PP. 321–357, 2002. DOI.10.1613/JAIR.953
[14] L. BREIMAN, “RANDOM FORESTS. MACHINE LEARNING,” VOL. 45, PP.5–32, 2001. DOI. 10.1023/A:1010933404324
[15] LIAW AND M. WIENER, “CLASSIFICATION AND REGRESSION BY RANDOM FOREST,” R NEWS, VOL. 2, NO. 3, PP. 18–22, 2002. DOI.10.4236/IJCNS.2016.95010
[16] D. M. W. POWERS, “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC”, INFORMEDNESS & CORRELATION JOURNAL OF MACHINE LEARNING TECHNOLOGIES, VOL. 8, NO. 1, PP. 37-63, 2011.
[17] S. NUANMEESRI, “MOBILE APPLICATION FOR THE PURPOSE OF MARKETING, PRODUCT DISTRIBUTION AND LOCATION-BASED LOGISTICS FOR ELDERLY FARMERS,” (IN PRESS), APPLIED COMPUTING AND INFORMATICS, VOL. 9, 2019. DOI.10.1016/J.ACI.2019.11.001
[18] S. NUANMEESRI AND W. SRIURAI, “THE APPLICATION OF THE MULTI-LAYER PERCEPTRON NEURAL NETWORK TECHNIQUE FOR DEVELOPING THE CAREER-SUGGESTION SIMULATION MODEL FOR UNDERGRADUATE STUDENTS,” INTERNATIONAL JOURNAL OF ADVANCED SCIENCE AND TECHNOLOGY, VOL. 9, NO. 5, PP. 2283-2292, 2020.