Documents Classification Based On Deep Learning

Aalaa Abdulwahab, Hussein Attya, Yossra Hussain Ali



Topic modeling, LDA, CNN,TF-IDF, Deep learning.



Every day a large number of digital text information is generated, the effectively searching, exploring and managing text data has become a main task. The Text Classification has areas in Sentiment Analysis, Subjectivity/Objectivity Analysis, and Opinion Polarity the Convolution Neural Networks (CNN’s) has a good performance and accuracy therefore it gained special attention. Latent Dirichlet Allocation (LDA) is a classic topic model that able to extract latent topic from high dimensions and large-scale multi-class textual data(large data corpus).In this paper, we present a comparison among CNN ,traditional LDA and modified LDA with TF-IDF algorithm to classify a large pool of documents as a data set, it’s 20 news group. Experiment results show that the accuracy performance of CNN (94%) is better than the modified LDA approach (74.4% ) and traditional LDA (60%).The time to perform dataset classification by Traditional LDA is 4.04m, Modified LDA is 3.02m was less than time of CNN model 11.52m.



