Mathematical Symbol Extraction From Document Images - A Comprehensive Review

The global effects of high speed internet access as hundreds of millions browse for information/multimedia, look up map directions, interact through email/social networks/ video chat, etc. Nowadays document images play a vital role in digitized organization and digitized libraries. Digitized means paper documents are converted into image format by using digitized equipment’s. Optical Character Recognition (OCR) is a one of the document image analysis technique, which is used to convert document image into editable text format. Mathematical document identification is a unique challenge in document image analysis that deals with identifying mathematical symbols in a document and then classifying the document as math’s and non-math’s regions based on density of the mathematical symbols. Formulas are involved in mathematical documents, either as isolated formulas, or embedded directly into a text line. They have a number of features, which distinguish them from conventional text. This paper provides the basic concepts of the mathematical symbol recognition and its essential characteristics.



