Improving Degraded Document Images Using Binarization Technique
Sayali Shukla, Ashwini Sonawane, Vrushali Topale, Pooja Tiwari
Keywords: Binarization, Adaptive Image Contrast, Local Image Contrast, Local Image Gradient, Detection of Text Stroke Edges, Pixel Classification, Thresholding.
Abstract: Image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image. In the process of improving degraded document images segmentation is one of the difficult task due to background and foreground variation. This paper presents a new approach for enhancement of degraded documents. It consists of an adaptive image contrast based document image binarization technique that is tolerant to different type of document degradation such as uneven illumination document smear involving smudging of text, seeping of ink to the other side of page, degradation of paper ink due to aging etc. The images i.e. scanned copies of these degraded documents are provided as an input to the system. They are processed to get the finest improved document so that the contents are visible readable. Contrast image construction can be constructed using local image gradient and local image contrast. Further edge estimation algorithm is used to identify the text stroke edge pixels .The text within the document is further segmented by a thresholding technique which is based on the height and width of letter size present in degraded document image. It works for different format of degraded document images. The method has been tested on Document Image Binarization Contest (DIBCO) experiments on Bickley diary dataset, consists of several challenging degraded document images.
. B. Gatos, K. Ntirogiannis, and I. Pratikakis, “ICDAR 2009 document image binarization contest (DIBCO 2009),” in Proc. Int. Conf. Document Anal. Recognit., Jul. 2009, pp. 1375–1382.
. I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2011 document image binarization contest (DIBCO 2011),” in Proc. Int. Conf. Document Anal. Recognit., Sep. 2011, pp. 1506–1510.
. I. Pratikakis, B. Gatos, and K. Ntirogiannis, “H-DIBCO 2010 handwritten document image binarization competition,” in Proc. Int. Conf. Frontiers Handwrit. Recognit., Nov. 2010, pp. 727–732.
. S. Lu, B. Su, and C. L. Tan, “Document image binarization using background estimation and stroke edges,” Int. J. Document Anal. Recognit., vol. 13, no. 4, pp. 303–314, Dec. 2010.
. B. Su, S. Lu, and C. L. Tan, “Binarization of historical handwritten document images using local maximum and minimum filter,” in Proc. Int. Workshop Document Anal. Syst., Jun. 2010, pp. 159–166.
. G. Leedham, C. Yan, K. Takru, J. Hadi, N. Tan, and L. Mian, “Comparison of some thresholding algorithms for text/background segmentation in difficult document images,” in Proc. Int. Conf. Document Anal. Recognit., vol. 13. 2003, pp. 859–864.
. M. Sezgin and B. Sankur, “Survey over image thresholding techniques and quantitative performance evaluation,” J. Electron. Imag., vol. 13, no. 1, pp. 146–165, Jan. 2004.
. O. D. Trier and A. K. Jain, “Goal-directed evaluation of binarization methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 12, pp. 1191–1201, Dec. 1995.
. O. D. Trier and T. Taxt, “Evaluation of binarization methods for document images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 3, pp. 312–315, Mar. 1995.
. A. Brink, “Thresholding of digital images using two-dimensional Entropies,” Pattern Recognit., vol. 25, no. 8, pp. 803–808, 1992.
. J. Kittler and J. Illingworth, “On threshold selection using clustering criteria,” IEEE Trans. Syst., Man, Cybern., vol. 15, no. 5,pp. 652–655, Sep.–Oct. 1985.
. N. Otsu, “A threshold selection method from gray level histogram,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 1, pp. 62–66, Jan. 1979.
. N. Papamarkos and B. Gatos, “A new approach for multithreshold selection,” Comput. Vis. Graph. Image Process., vol. 56, no. 5, pp. 357–370, 1994.
. J. Bernsen, “Dynamic thresholding of gray-level images,” in Proc. Int. Conf. Pattern Recognit., Oct. 1986, pp. 1251–1255.
. L. Eikvil, T. Taxt, and K. Moen, “A fast adaptive method for binarization of document images,” in Proc. Int. Conf. Document Anal. Recognit., Sep. 1991, pp. 435–443.
. I.-K. Kim, D.-W. Jung and R.-H. Park, “Document image binarization based on topographic analysis using a water flow model,” Pattern Recognit., vol. 35, no. 1, pp. 265–277, 2002.
. J. Parker, C. Jennings, and A. Salkauskas, “Thresholding using an illumination model,” in Proc. Int. Conf. Doc. Anal. Recognit., Oct. 1993, pp. 270–273.
. J. Sauvola and M. Pietikainen, “Adaptive document image binarization,” Pattern Recognit., vol. 33, no. 2, pp. 225–236, 2000. SU et al.:
. W. Niblack, an Introduction to Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1986.
. J.-D. Yang, Y.-S. Chen, and W.-H. Hsu, “Adaptive thresholding algorithm and its hardware implementation,” Pattern Recognit. Lett., vol. 15,no. 2, pp. 141–150, 1994.