International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 10 - Issue 5, May 2021 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Modeling Of Arabic Language For Authorship Identification

[Full Text]



Heba M . Khalil ,Ahmed Taha,Tarek . El-shishtawy



Forensic authorship authentication, Stylometric features, Ensemble methods.



With the vast volume of data processed in digital form today, the need for and capability of analysing and processing this data for forensic authorship authentication has increased. The focus of study has concentrated on English, Spanish, and German. Arabic language has received less attention from the academic community due to the difficulty and length of Arabic sentences. This article provides a set of stylometric features derived from the study of many articles' parts of expression, including adjectives ratio, sentence size, conjunctions, and others. This details is classified into two categories: statistical features and linguistic features. The AdaBoost and Bagging ensemble approaches have been proposed in this research to maximise predictive efficiency in Arabic articles by using multiple learning. The results indicate that the Bagging model achieves average accuracy of 91.5 %, while the AdaBoost model achieves the highest accuracy of 93.6 %.



[1] E. Stamatatos, "A survey of modern authorship attribution methods," J. Amer. Soc. Inf. Sci. Technol. , vol. 60, no. 3, pp. 538–556, 2009, doi: 10. 1002/asi.21001.
[2] I. Markov, J. Baptista, and O. Pichardo-Lagunas, "Authorship attribution in portuguese using character n-grams," Acta Polytechnica Hungarica, vol. 14, no. 3, pp. 59–78, 2017.
[3]S. Lahiri and R. Mihalcea, "Authorship attribution using word network features," 2013, arXiv:1311.2978. [Online]. Available: https://arxiv.org/abs/1311.2978.
[4]M. G. Kendall, F. Mosteller, and D. L. Wallace, ‘‘Inference and disputed authorship: The federalist,’’ Biometrics, vol. 22, no. 1, p. 200, Mar. 1966.
[5]E. Dauber, R. Overdorf, and R. Greenstadt, ‘‘Stylometric authorship attribution of collaborative documents,’’ in Proc. Int. Conf. Cyber Secur. Cryptogr. Mach. Learn. , Jun. 2017, pp. 115–135.
[6]P. Szwed, ‘‘Authorship attribution for polish texts based on part of speech tagging,’’ in Proc. Int. Conf., Beyond Databases, Archit. Struct. Cham, Switzerland: Springer, May 2017, pp. 316–328.
[7]P. P. Paul, M. Sultana, S. A. Matei, and M. Gavrilova, "Authorship disambiguation in a collaborative editing environment," Comput. Secur. , vol. 77, pp. 675–693, Aug. 2018.
[8]C. Akimushkin, D. R. Amancio, and O. N. Oliveira, "On the role of words in the network structure of texts: Application to authorship attribution," Phys. A, Stat. MechAppl. , vol. 495, pp. 49–58, Apr. 2018.
[9]J.-P. Posadas-Durán, H. Gómez-Adorno, G. Sidorov, I. Batyrshin, D. Pinto, and L. Chanona-Hernández, ‘‘Application of the distributed document representation in the authorship attribution task for small corpora,’’ Soft Comput. , vol. 21, no. 3, pp. 627–639, 2017.
[10]A.-F. Ahmed, R. Mohamed, and B. Mostafa, "Machine learning for authorship attribution in Arabic poetry," Int. J. Future Comput. Commun. , vol. 6, no. 2, pp. 42–46, Jun. 2017.
[11]F. M. Giraud and T. Artières, ‘‘Feature bagging for author attribution,’’ in Proc. CLEF (Online Working Notes/Labs/Workshop), 2012.
[12]E. Ekinci and H. Takçı, "Comparing ensemble classifiers: Forensic analysis of electronic mails," Tech. Rep., 2013.
[13]A. Abbasi and H. Chen, "Applying authorship analysis to Arabic Web content," in Proc. Int. Conf. Intell. Secur. Inform. Berlin, Germany: Springer, May 2005, pp. 183–197.
[14]M. Al-Ayyoub, Y. Jararweh, A. Rabab'ah, and M. Aldwairi, "Feature extraction and selection for Arabic tweets authorship authentication," J. Ambient Intell. Humanized Comput. , vol. 8, no. 3, pp. 383–393, 2017.
[15]Shaker, Kareem, and David Corne. "Authorship attribution in arabic using a hybrid of evolutionary search and linear discriminant analysis." 2010 UK Workshop on Computational Intelligence (UKCI). IEEE, 2010.
[16]Abooraig, Raddad, et al. "Automatic categorization of Arabic articles based on their political orientation." Digital Investigation 25 (2018): 24-41.
[17]S. Ouamour and H. Sayoud, "Authorship attribution of ancient texts written by ten Arabic travelers using a SMO-SVM classifier," in Proc. Int. Conf. Commun. Inf. Technol. (ICCI), 2012, pp. 44–47.
[18]Keselj, Fuchun Pengt Dale Schuurmanst Vlado, and Shaojun Wang. "Language Independent Authorship Attribution using Character Level Language Models."
[19]Türkoğlu, Filiz, Banu Diri, and M. Fatih Amasyalı. "Author attribution of Turkish texts by feature mining." International Conference on Intelligent Computing. Springer, Berlin, Heidelberg, 2007.
[20]Luyckx, Kim. "Authorship attribution of e-mail as a multi-class task." Notebook for PAN at CLEF (2011).
[21]Diederich, Joachim, et al. "Authorship attribution with support vector machines." Applied intelligence 19.1 (2003): 109-123.
[22]Altheneyan, Alaa Saleh, and Mohamed El Bachir Menai. "Naïve Bayes classifiers for authorship attribution of Arabic texts." Journal of King Saud University-Computer and Information Sciences 26.4 (2014): 473-484.