International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 9 - Issue 3, March 2020 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Mathematical Symbol Extraction From Document Images - A Comprehensive Review

[Full Text]






Document Images, Document Image Analysis, Optical Character Recognition (OCR), Mathematical Symbol Extraction.



The global effects of high speed internet access as hundreds of millions browse for information/multimedia, look up map directions, interact through email/social networks/ video chat, etc. Nowadays document images play a vital role in digitized organization and digitized libraries. Digitized means paper documents are converted into image format by using digitized equipment’s. Optical Character Recognition (OCR) is a one of the document image analysis technique, which is used to convert document image into editable text format. Mathematical document identification is a unique challenge in document image analysis that deals with identifying mathematical symbols in a document and then classifying the document as math’s and non-math’s regions based on density of the mathematical symbols. Formulas are involved in mathematical documents, either as isolated formulas, or embedded directly into a text line. They have a number of features, which distinguish them from conventional text. This paper provides the basic concepts of the mathematical symbol recognition and its essential characteristics.



[1] Jacob R. Bruce, Mathematical Expression Detection and Segmentation in Document Images.
[2] H. F. Shantz, the History of OCR, Optical Character Recognition. Manchester Center: Recognition Technologies Users Association, 1982.
[3] P. W. Handel, "Statistical Machine," United States Patent Office. 1,915,993, Jun, 27, 1933.
[4] Kacem, A. Belaïd and M. Ben Ahmed “Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context”
[5] Xiaoyan Lin, LiangcaiGao, Zhi Tang, Xiaofan Lin, Xuan Hu, “Mathematical Formula Identification in PDF Documents”, 2011 International Conference on Document Analysis and Recognition.
[6] IffathFathima S and Ashoka K, “Machine Learning Approach for Recognition of Mathematical Symbols”, International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882, Volume 6, Issue 8, August 2017.
[7] AzadehNazemi and Iain Murray, “Mathematical Formula Recognition and Transformation to a Linear Format Suitable for Vocalization”, International Journal on Computer Science and Engineering (IJCSE), ISSN: 0975-3397, Vol. 5 No. 09 Sep 2013.
[8] G,Erik . Miller .A. Paul, “Ambiguity and Constraint in Mathematical Expression recognition”. Viola Massachusetts Institute of TechnologyArtificial Intelligence Laboratory545 Technology Square, Office 707Cambridge, MA 02139. In Proceedings of the 15'th National Conference on Artificial Intelligence (AAAI-98)
[9] P,Garcia. B, Couasnon, “Using a Generic Document Recognition Method for Mathematical Formulae Recognition”. IRISA / INSA-D´epartement Informatique20, Avenue des buttes de Co¨esmes, CS 14315F-35043 Rennes Cedex, France, Graphics Recognition Algorithms and Applications Lecture Notes in Computer Science Volume 2390, 2002, pp 236-244.
[10] Xuedong Tian, RuihanBai, Fang Yang, JinyuanBai, Xinfu Li, “Mathematical Expression Extraction in Text Fields of Documents Based on HMM” Journal of Computer and Communications, 2017.
[11] Kacem, A. Belaïd and M. Ben Ahmed, “EXTRAFOR : automatic EXTRAction of mathematical FORmulas”
[12] Simone Marinai, “A Survey of Document Image Retrieval in Digital Libraries”, 9th colloque International Francophone Sur l’Ecritet le Document (CIFED)-2006, pp. 193–198.
[13] Dr. S. Vijayarani and A. Sakila, “A Survey on Word Spotting Techniques for Document Image Retrieval” International Journal of Engineering Applied Sciences and Technology, Vol. 1, No. 1, Dec 2016.
[14] Francisco Álvaro, Joan Andreu Sanchez,” Comparing Several Techniques for Offline Recognition of Printed Mathematical symbols”, 2010 International Conference on Pattern Recognition.
[15] Mou-Yen Chen, AmlanKundu and SargurN. Srihari, “Variable Duration Hidden Markov Model and Morphological Segmentation for Document Word Recognition”, IEEE transactions on image processing, vol. 4, no. 12, December 1995.
[16] Nawei Chen, Dorothea Blostein “A survey of document image classification: problem statement,classifier architecture and performance evaluation”1 June 2004.