International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 9 - Issue 2, February 2020 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Documents Classification Based On Deep Learning

[Full Text]



Aalaa Abdulwahab, Hussein Attya, Yossra Hussain Ali



Topic modeling, LDA, CNN,TF-IDF, Deep learning.



Every day a large number of digital text information is generated, the effectively searching, exploring and managing text data has become a main task. The Text Classification has areas in Sentiment Analysis, Subjectivity/Objectivity Analysis, and Opinion Polarity the Convolution Neural Networks (CNN’s) has a good performance and accuracy therefore it gained special attention. Latent Dirichlet Allocation (LDA) is a classic topic model that able to extract latent topic from high dimensions and large-scale multi-class textual data(large data corpus).In this paper, we present a comparison among CNN ,traditional LDA and modified LDA with TF-IDF algorithm to classify a large pool of documents as a data set, it’s 20 news group. Experiment results show that the accuracy performance of CNN (94%) is better than the modified LDA approach (74.4% ) and traditional LDA (60%).The time to perform dataset classification by Traditional LDA is 4.04m, Modified LDA is 3.02m was less than time of CNN model 11.52m.



[1] Lilleberg, Joseph, Yun Zhu, and Yanqing Zhang. "Support vector machines and word2vec for text classification with semantic features." 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 2015.
[2] 2-Conneau A., Schwenk H. , Le Cun Y. and Barrault L. ,” Very Deep Convolutional Networks for Text Classification”, Association for Computational Linguistics,2017, Volume 1, pages 1107–1116, Valencia, Spain.
[3] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research3.Jan (2003): 993-1022.
[4] Jelodar, Hamed, et al. "Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey." Multimedia Tools and Applications 78.11 (2019): 15169-15211.
[5] Zhao, Dexin, Jinqun He, and Jin Liu. "An improved LDA algorithm for text classification." 2014 International Conference on Information Science, Electronics and Electrical Engineering. Vol. 1. IEEE, 2014.
[6] Das, Rajarshi, Manzil Zaheer, and Chris Dyer. "Gaussian lda for topic models with word embeddings." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. 7- Zhang H.and Zhong G.,” Improving short text classification by learning vector representations of both word and hidden topics”. Volume 102, 15 June 2016, Pages 76-86.
[7] Feng, Lei, et al. "Topic Modeling of Environmental Data on Social Networks Based on ED-LDA." International Journal of Environmental Monitoring and Analysis 6.3 (2018): 77.
[8] Johnson, Rie, and Tong Zhang. "Effective use of word order for text categorization with convolutional neural networks."arXiv preprint arXiv:1412.1058 (2014).
[9] Gu, Jiuxiang, et al. "Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.
[10] Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
[11] Zhang, Xuemiao, et al. "Text Classification Model Based on Document Matrix Convolutional Neural Networks." 2017 2nd International Conference on Control, Automation and Artificial Intelligence (CAAI 2017). Atlantis Press, 2017.
[12] Tian, Juan, Dingju Zhu, and Hui Long. "Chinese Short Text Multi-Classification Based on Word and Part-of-Speech Tagging Embedding." Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. ACM, 2018.
[13] Dieng, Adji B., Francisco JR Ruiz, and David M. Blei. "The Dynamic Embedded Topic Model." arXiv preprint arXiv:1907.05545 (2019).
[14] Boyd-Graber, Jordan L., and David M. Blei. "Syntactic topic models." Advances in neural information processing systems. 2009.
[15] Salton, Gerard, and Clement T. Yu. "On the construction of effective vocabularies for information retrieval." ACM SIGIR Forum. Vol. 9. No. 3. ACM, 1973.
[16] Ye, Jingyi, Xiaojun Jing, and Jia Li. "Sentiment analysis using modified LDA." International conference on signal and information processing, networking and computers. Springer, Singapore, 2017.
[17] Georgakopoulos, Spiros V., et al. "Convolutional neural networks for toxic comment classification." Proceedings of the 10th Hellenic Conference on Artificial Intelligence. ACM, 2018.
[18] Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016.
[19] Chen, Si, et al. "Deep Learning Method with Attention for Extreme Multi-label Text Classification." Pacific Rim International Conference on Artificial Intelligence. Springer, Cham, 2019.