International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 9 - Issue 12, December 2020 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

A New Sentiment Analysis System Of Tweets Based On Machine Learning Approach

[Full Text]



Yousef El Mourabit, Youssef El Habouz, Mustapha Lydiri Hicham Zougagh



Neural network, Machine learning, Sentiment analysis system, Twitter



A very huge amount of data is generated every second for microblogs, content sharing via Social media sites and social networking. Twitter is an important popular microblog where people voice their opinions with regard to daily issues. Recently, analyzing these opinions is the main concern of Sentiment analysis (or opinion mining). Efficiently capturing, gathering and analyzing sentiments has been challenging for researchers. To deal with these challenges, in this paper we propose a highly accurate model for sentiment analysis of tweets. Using the Crowdflower's dataset, we started by data preprocessing (replace missing value, Denoising, tokenization, stemming…). We applied a semantic model with Term Frequency, Inverse Document Frequency weighting for data representation. In the measuring and evaluation step we applied four machine-learning algorithms such as Naive Bayesian, K-Nearest Neighbors, Neural Networks (LSTM), and Support Vector Machine. Afterwards, and based on the results we boiled a highly efficient prediction model with python, we trained and evaluated the classification model according to the most efficient metrics measures in this field, then tested the model on a set of unclassified tweets, to predict the sentiment class of each tweets. Experimental results demonstrate that our model reached a high accuracy compared to the other models.



[1] Chowdhary, K. R. "Natural language processing." Fundamentals of Artificial Intelligence. Springer, New Delhi, 2020. 603-649.
[2] Park, Seungtae, et al. "Wavelet-like convolutional neural network structure for time-series data classification." Smart Structures and Systems 22.2 (2018): 175-183.
[3] Nguyen, Duong Huong, et al. "Damage detection in truss bridges using transmissibility and machine learning algorithm: Application to Nam O bridge." Smart Structures and Systems 26.1 (2020): 35-47.
[4] Ye, X. W., T. Jin, and C. B. Yun. "A review on deep learning-based structural health monitoring of civil infrastructures." Smart Structures and Systems 24.5 (2019): 567-585.
[5] YU, Thein et NWET, Khin Thandar. Sentiment Analysis System for Myanmar News Using Support Vector Machine and Naïve Bayes. In : International Conference on Genetic and Evolutionary Computing. Springer, Singapore, 2019. p. 551-557.
[6] DEY, Sanjay, WASIF, Sarhan, TONMOY, Dhiman Sikder, et al. A Comparative Study of Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon Product Reviews. In : 2020 International Conference on Contemporary Computing and Applications (IC3A). IEEE, 2020. p. 217-220.
[7] [MUSTAQIM, T., UMAM, K., et MUSLIM, M. A. Twitter text mining for sentiment analysis on government’s response to forest fires with vader lexicon polarity detection and k-nearest neighbor algorithm. In : Journal of Physics: Conference Series. IOP Publishing, 2020. p. 032024.
[8] Giménez, M., Palanca, J., & Botti, V. (2020). Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis. Neurocomputing, 378, 315-323.
[9] Kim, D., Seo, D., Cho, S., & Kang, P. (2019). Multi-co-training for document classification using various document representations:TF– IDF,LDA,and Doc2Vec. Information Sciences,477,15-29.
[10] Hatzivassiloglou, V., & McKeown, K. R. (1997, July). Predicting the semantic orientation of adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics (pp. 174-181). Association for Computational Linguistics.
[11] Pang, B., & Lee, L. (2004, July). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. InProceedings of the 42nd annual meeting on Association for Computational Linguistics (p. 271). Association for Computational Linguistics.
[12] Das, S., & Chen, M. (2001, July). Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific finance association annual conference (APFA) (Vol. 35, p. 43).
[13] Morinaga, S., Yamanishi, K., Tateishi, K., & Fukushima, T. (2002, July). Mining product reputations on the web. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 341-349).
[14] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
[15] Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
[16] Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. InProceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
[17] Kharde, V., Sonawane, S.: Sentiment analysis of twitter data: a survey of techniques. Int. J. Comput. Appl. 139(11), 5–15 (2016)
[18] Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)
[19] Agarwal, B., Mittal, N.: Prominent Feature Extraction for Sentiment Analysis. Socio- Affective Computing Series. Springer International Publishing (2016).
[20] Fouad, M. M., Gharib, T. F., & Mashat, A. S. (2018, February). Efficient twitter sentiment analysis system with feature selection and classifier ensemble. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 516-527). Springer, Cham.
[21] Kumar, A., & Jaiswal, A. (2020). Systematic literature review of sentiment analysis on Twitter using soft computing techniques. Concurrency and Computation: Practice and Experience, 32(1), e5107.
[22] Guo, Xinyi, and Jinfeng Li. "A Novel Twitter Sentiment Analysis Model with Baseline Correlation for Financial Market Prediction with Improved Efficiency." 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, 2019.
[23] Artama, M., Sukajaya, I. N., & Indrawan, G. (2020, April). Classification of official letters using TF-IDF method. In Journal of Physics: Conference Series (Vol. 1516, No. 1, p. 012001). IOP Publishing.
[24] Cortes, C.; Vapnik, V.N. Support-Vector networks. Mach. Learn. 1995, 20, 273–297.
[25] Vapnik, V.N. Statistical Learning Theory; John Wiley & Sons Inc.: New York, NY, USA, 1998.
[26] P. Tripathi, S. K. Vishwakarma, and A. Lala, “Sentiment Analysis of English Tweets Using Rapid Miner,” in 2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015, pp. 668–672
[27] S. Wahyuningsih, D. R. Utari, U. B. Luhur, D. Tree, and K. Validation, “Perbandingan Metode K-Nearest Neighbor , Naïve Bayes dan Decision Tree untuk Prediksi Kelayakan Pemberian Kredit,” Konf. Nas. Sist. Inf. 2018, pp. 8–9, 2018
[28] Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural. Comput. 1997, 9, 1735–1780.
[29] Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.J.; Sun, Y.; Zheng, M.H. Wind Power Short-Term Prediction Based on LSTM and Discrete Wavelet Transform. Appl. Sci. 2019, 9, 1108.
[30] Pavarsi, H. J., Hariri, N., Alipour-Hafezi, M., Al-Hawaeji, F. B., & Khademi, M. (2020). Machine Indexing of Documents in the Field of Information Retrieval Using Text Mining in the RapidMiner Software.
[31] Anandarajan, M., Hill, C., & Nolan, T. (2019). Learning-Based Sentiment Analysis Using RapidMiner. In Practical Text Analytics (pp. 243-261). Springer, Cham.
[32] Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216- 231.
[33] Maraziotis, I. A., Perantonis, S., Dragomir, A., & Thanos, D. (2019). K- Nets: Clustering through nearest neighbors networks. Pattern Recognition, 88, 470-481.
[34] Mehta, R. P., Sanghvi, M. A., Shah, D. K., & Singh, A. (2020). Sentiment Analysis of Tweets Using Supervised Learning Algorithms. In First International Conference on Sustainable Technologies for Computational Intelligence (pp. 323-338). Springer, Singapore.
[35] Pham, B. T., Prakash, I., Khosravi, K., Chapi, K., Trinh, P. T., Ngo, T. Q., ... & Bui, D. T. (2019). A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto International, 34(13), 1385-1407.
[36] El Mourabit, Y., Bouirden, A., Toumanari, A., & Moussaid, N. E. (2015). Intrusion detection techniques in wireless sensor network using data mining algorithms: comparative evaluation based on attacks detection. International Journal of Advanced Computer Science and Applications, 6(9), 164- 172.
[37] Wang, Q., Liu, K., & Ma, K. (2019, April). Emotional Analysis of Public Opinions in Colleges and Universities: Based on Naive Bayesian Classification Method. In Journal of Physics: Conference Series (Vol. 1187, No. 5, p. 052042). IOP Publishing.