Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification

D. Brodic 1
  • 1 Technical Faculty Bor, University of Belgrade, V. J. 12, 19210, Bor, Serbia

Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification

Text line segmentation represents the key element in the optical character recognition process. Hence, testing of text line segmentation algorithms has substantial relevance. All previously proposed testing methods deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on extended binary classification is proposed. It is established on the various multiline text samples linked with text segmentation. Their results are distributed according to binary classification. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different types of scripts represents its main advantage.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Du, X., Pan, W., Bui, T.D. (2009). Text line segmentation in handwritten documents using Mumford-Shah model. Pattern Recognition, 42 (12), 3136-3145.

  • Likforman Sulem, L., Zahour, A., Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9 (2), 123-138.

  • Amin, A., Wu, S. (2005). Robust skew detection in mixed text/graphics documents. In Proceedings of International Conference on Document Analysis and Recognition - ICDAR'05. Seoul, Korea, 247-251.

  • Bukhari, S.S., Shafait, F., Breuel, T.M. (2009). Script-Independent handwritten textlines segmentation using active contours. In Proceedings of International Conference on Document Analysis and Recognition - ICDAR'09. Barcelona, Spain, 446-450.

  • Yi, L., Yefeng, Z., Doermann, D., Jaeger, S. (2008). Script-Independent text line segmentation in freestyle handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30 (8), 1313-1329.

  • Basu, S., Chaudhuri, C., Kundu, M., et. al. (2007). Text line extraction from multi-skewed handwritten document. Pattern Recognition, 40 (6), 1825-1839.

  • Marti, U.V., Bunke, H. (2002). The IAM-database: an English sentence database for off-line handwriting recognition. Journal on Document Analysis and Recognition, 5 (1), 39-46.

  • Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D. (2011). CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. International Journal on Document Analysis and Recognition, 14 (1), 25-33.

  • Gatos, B., Stamatopoulos, N., Louloudis, G. (2011). ICDAR2009 handwriting segmentation contest. International Journal on Document Analysis and Recognition, 14 (1), 1-13.

  • Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C. (2009) Text line and word segmentation of handwritten documents. Pattern Recognition, 42 (12), 3169-3183.

  • Sanchez, A., Suarez, P.D., Mello, C.A.B., Oliveira, A.L.I., Alves, V.M.O. (2008). Text line segmentation in images of handwritten historical documents. In Proceedings of the First Workshops on Image Processing Theory, Tools and Applications - IPTA 2008. Sousse, Tunisia, 1-6.

  • Brodić, D., Milivojević, D.R., Milivojević, Z. (2010). Basic test framework for the evaluation of text line segmentation and text parameter extraction. Sensors, 10 (5), 5263-5279.

  • Brodić, D. (2010). Basic experiments set for the evaluation of the text line segmentation. Przegląd Elektrotechniczny, 86 (11), 353-357.

  • Brodić, D. (2011). Advantages of the extended water flow algorithm for handwritten text line segmentation. In Kuznetsov, S.O., et al. (eds.) Pattern Recognition and Machine Intelligence, LNCS, Vol. 6744. Berlin-Heidelberg: Springer, 418-423.

  • Khashman, A., Sekeroglu, B. (2008). Document image binarisation using a supervised neural network. International Journal of Neural Systems, 18 (5), 405-418.

  • Brodić, D. (2011). The evaluation of the initial skew rate for printed text. Journal of Electrical Engineering - Elektrotechnický časopis, 62 (3), 134-140.

  • Zramdini, A., Inglod, R. (1993). Optical font recognition from projection profiles. Electronic Publishing, 6 (3), 249-260.

  • Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science (New Series), 240 (4857), 1285-1293.

  • Qian, X., Liu, G., Wang, H., Su, R. (2007). Text detection, localization, and tracking in compressed video. Signal Processing: Image Communication, 22 (9), 752-768.

  • Bukhari, S.S., Shafait, F., Bruesl, T.M. (2009). Adaptive binarization of unconstrained hand-held camera-captured document images. Journal of Universal Computer Science, 15 (18), 3343-3363.

  • Shi, Z., Govindaraju, V. (2004). Line separation for complex document images using fuzzy runlength. In Proceedings of the International Workshop on Document Image Analysis for Libraries. Palo Alto, U.S.A.

  • Manmatha, R., Rothfeder, J.L. (2005). A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (8), 1212-1225.

  • Brodić, D., Milivojević, Z. (2010). Optimization of the Gaussian kernel extended by binary morphology for text line segmentation. Radioengineering, 19 (4), 718-724.

  • Razak, Z., Zulkiflee, K., et al. (2008). Off-Line handwriting text line segmentation: a review. International Journal of Computer Science and Network Security, 8, 12-20.

  • Brodić, D. (2010). Optimization of the anisotropic Gaussian kernel for text segmentation and parameter extraction. In Calude, C.S., Sassone, V. (eds.) Theoretical Computer Science, IFIP AICT, Vol. 323. Berlin-Heidelberg: Springer, 140-152.

OPEN ACCESS

Journal + Issues

Search