International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 8 - Issue 7, July 2019 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

What Affects K Value Selection In K-Nearest Neighbor

[Full Text]



Iman Paryudi



Data Mining, Classification, k-Nearest Neighbor, k value selection, 2-class data sets, n-class data sets



k-Nearest neighbor is a popular classifier and has been applied in many fields. One problem with this classifier is the choice of k value. Different k values can have a large impact on the predictive accuracy of the algorithm, and picking a good value is generally unintuitive by looking at the data set. Because of the difficulty in selecting the k value, it is suggested to using variable k values instead of just one static k value. The numbers of nearest neighbors (k value) selected for different categories are adaptive to their sample size in the training set. There are many ways of choosing the k value, but a simple one is to run the algorithm many times with different k values and choose the one with the best performance. However, this method will take long time if we want to do experiment with a lot of k values. In relation to this problem, this paper will present the result of our experiment on what data properties affect the choice of k value. There are two interesting results from this experiment. The first is the experiment result showing the relationship between big data sets with more than 8000 instances (Mush, MGT, SS, Adu, and BM) and small best k values in 2-class data sets. And the second is where there is a relationship between data sets having numerical attribute type and small best k values in n-class data sets.



[1] Guo, G., Wang, H., Bell, D., Bi, Y., and Greer, K., “KNN Model-Based Approach in Classification,” Proceedings OTM Confederated International Conferences CoopIS, DOA, and ODBASE, Italy, 2003.
[2] Han, E-H., Karypis, G., and Kumar, V., “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” https://pdfs.semanticscholar.org/b3ca/32cafe5343a7602549ae5e51fc2660633cbf.pdf.
[3] Liao, Y. and Vemuri, V.R., “Use of K-Nearest Neighbor Classifier for Intrusion Detection,” Computer & Security, Vol. 21(5): 439-448, 2002.
[4] Lee, Y., “Handwritten Digit Recognition Using K-Nearest Neighbor, Radial-Basis Function, and Backpropagation Neural Networks,” Neural Computation, Vol. 3(3): 440-440, 1991.
[5] Paryudi, I., “Alternative Design Exploration using K-Nearest Neighbor Technique and Semantic Web Technology in an Energy Simulation Tool,” International Journal of Advances in Computer Science and Technology, Vol. 2, No, 10, 2013.
[6] Hulett, C., Hall, A., and Qu, G., “Dynamic Selection of k Nearest Neighbors in Instance-based Learning,” IEEE IRI, 2012.
[7] Baoli, L., Qin, L., and Shiwen, Y., “An Adaptive k-Nearest Neighbor Text Categorization Strategy,” ACM Transaction on Asian Language Information Processing, Vol. 3, No. 4, 2004, 215-226.
[8] Sun, S. and Huang, R., “An Adaptive k-Nearest Neighbor Algorithm,” 2010 Seventh International Conference on Fuzzy System and Knowledge Discovery, 2010.
[9] Jivani, A.G., “The Novel k Nearest Neighbor Algorithm,” 2013 International Conference on Computer Communication and Informatics, India, 2013.
[10] Weinberger, K.Q. and Saul, L.K., “Distance metric learning for large margin nearest neighbor classification,” The Journal of Machine Learning Research, vol. 10, 2009, pp. 207-244.
[11] Jiang, L., Zhang, H., and Cai, Z., “Dynamic K-Nearest-Neighbor Naïve Bayes with Attribut Weighted,” FSKD 2006, LNAI 4223, pp. 365-368, 2006.
[12] I. Guyon, and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[13] Güvenir, H. A. and Akkus, A., “Weighted K Nearest Neighbor Classification Feature Projections,” https://www.semanticscholar.org/paper/Weighted-K-Nearest-Neighbor-Classification-on-G%C3%BCvenir-Akku%C5%9F/78277df1b9f6e6b6e50fd4fb0d519be5a9bd2180.
[14] Batista, G. and Silva, D.F., “How k-Nearest Neighbor Parameters Affect its Performance,” 38o JAII0 – Simposio Argentino de Inteligencia Artificial (ASAI 2009), pp. 95-106, 2009.
[15] Islam, M. J., Wu, Q. M., Ahmadi, Majid, and Sid-Ahmed, M. A., “Investigating the Performance of Naïve Bayes Classifiers and K-Nearest Neighbor Classifiers,” Journal of Convergence Information Technology, Vol. 5, No. 2, 2010.
[16] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html.
[17] Amendolia, S. R., Cossu, G., Ganadu, M. L., Golosio, B., Masala, G. L., and Mura, G. M., “A Comparative Study of K-Nearest Neighbor, Support Vector Machine and Multi-Layer Perceptron for Thalasemia Screening,” Chemometrics and Intelligent Laboratory System 69, 2003, 13-20.
[18] Kardan, A. A., Kavian, A., and Esmaeili, A., “Simultaneous Feature Selection and Feature Weighting with K Selection for KNN Classification using BBO Algorithm,” 2013 5th Conference on Information Knowledge Technology, 2013.