Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform

Authors

  • Abdelghani Souhar Université Ibn-Tofail image/svg+xml
  • Youssef Boulid Research and Development Unit Maxware Technology, Kenitra (Morocco).
  • El Bachir Ameur Université Ibn-Tofail image/svg+xml
  • Mly Moustafa Ouagague Research and Development Unit Maxware Technology, Kenitra (Morocco).

DOI:

https://doi.org/10.9781/ijimai.2017.08.002

Keywords:

Arabic Documents, Handwritten Character Recognition, Text Mining, Text Line Segmentation, Connected Component Analysis, Projection Profile, Watershed Transform

Abstract

A crucial task in character recognition systems is the segmentation of the document into text lines and especially if it is handwritten. When dealing with non-Latin document such as Arabic, the challenge becomes greater since in addition to the variability of writing, the presence of diacritical points and the high number of ascender and descender characters complicates more the process of the segmentation. To remedy with this complexity and even to make this difficulty an advantage since the focus is on the Arabic language which is semi-cursive in nature, a method based on the Watershed Transform technique is proposed. Tested on «Handwritten Arabic Proximity Datasets» a segmentation rate of 93% for a 95% of matching score is achieved.

Downloads

Download data is not yet available.

References

Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9(2), 123-138.

Boulid, Y., Souhar, A., & Elkettani, M. Y. (2016) Segmentation approach of Arabic manuscripts text lines based on multi agent systems. International Journal of. Computer Information Systems and Industrial Management, 8, 173–183.

Khayyat, M., Lam, L., Suen, C. Y., Yin, F., & Liu, C. L. (2012, March). Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on (pp. 100-104). IEEE.

Nikolaou, N., Makridis, M., Gatos, B., Stamatopoulos, N., & Papamarkos, N. (2010). Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths. Image and Vision Computing, 28(4), 590-604.

Shi, Z., & Govindaraju, V. (2004). Line separation for complex document images using fuzzy runlength. In Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on (pp. 306-312). IEEE.

Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., & Alaei, A. (2013, August). ICDAR 2013 handwriting segmentation contest. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on (pp. 1402-1406). IEEE.

Saabni, R., Asi, A., & El-Sana, J. (2014). Text line extraction for historical document images. Pattern Recognition Letters, 35, 23-33.

Arvanitopoulos, N., & Süsstrunk, S. (2014, September). Seam carving for text line extraction on color and grayscale historical manuscripts. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on (pp. 726-731). IEEE.

Shi, Z., Setlur, S., & Govindaraju, V. (2009, July). A steerable directional local profile technique for extraction of handwritten arabic text lines. In Document Analysis and Recognition, 2009. ICDAR’09. 10th International Conference on (pp. 176-180). IEEE.

Pastor-Pellicer, J., Afzal, M. Z., Liwicki, M., & Castro-Bleda, M. J. (2016, April). Complete System for Text Line Extraction Using Convolutional Neural Networks and Watershed Transform. In Document Analysis Systems (DAS), 2016 12th IAPR Workshop on (pp. 30-35). IEEE.

Kumar, J., Abd-Almageed, W., Kang, L., & Doermann, D. (2010, June). Handwritten Arabic text line segmentation using affinity propagation. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (pp. 135-142). ACM.

Kumar, J., Kang, L., Doermann, D., & Abd-Almageed, W. (2011, September). Segmentation of handwritten textlines in presence of touching components. In Document Analysis and Recognition (ICDAR), 2011 International Conference on (pp. 109-113). IEEE.

Boulid, Y., Souhar, A., & El Kettani, M. E. Y. (2016). Detection of Text Lines of Handwritten Arabic Manuscripts using Markov Decision Processes. International Journal of Interactive Multimedia and Artificial Intelligence, 4(1), 31-36.

Razak, Z., Zulkiflee, K., Idris, M. Y. I., Tamil, E. M., Noor, M. N. M., Salleh, R., ... & Yaacob, M. (2008). Off-line handwriting text line segmentation: A review. International Journal of Computer Science and Network Security, 8(7), 12-20.

Oh, K., Kim, S., Na, I., & Kim, G. (2014). Text Line Segmentation using AHTC and Watershed Algorithm for Handwritten Document Images. International Journal of Contents, 10(3), 35-40.

Meyer, F. (1994). Topographic distance and watershed lines. Signal processing, 38(1), 113-125.

Bennasri, A., Zahour, A., & Taconet, B. (1999). Extraction des lignes d’un texte manuscrit arabe. In Vision Interface (Vol. 99, pp. 42-48).

Nicolaou, A., & Gatos, B. (2009, July). Handwritten text line segmentation by shredding text into its lines. In Document Analysis and Recognition, 2009. ICDAR’09. 10th International Conference on (pp. 626-630). IEEE.

Marti, U. V., & Bunke, H. (2001). On the influence of vocabulary size and language models in unconstrained handwritten text recognition. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on (pp. 260-265). IEEE.

Zahour, A., Likforman-Sulem, L., Boussellaa, W., & Taconet, B. (2007, September). Text line segmentation of historical Arabic documents. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on (Vol. 1, pp. 138-142). IEEE.

Handwritten Arabic Proximity Datasets. Language and Media Processing Laboratory. http://lampsrv02.umiacs.umd.edu/projdb/project.php?id=65.

Boulid, Y., Souhar, A., Ameur, Elb. & Ouagague, Mly. M. (2017) Watershed transform for text lines extraction on binary Arabic handwritten documents. In Proceedings of BDCA’17, Tetouan, Morocco, March 29-30, 2017, 6 pages. https://doi.org/10.1145/3090354.3090444

Boulid, Y., Souhar, A., & Elkettani, M. E. Multi-agent Systems for Arabic Handwriting Recognition. International Journal of Interactive Multimedia and Artificial Intelligence, (2017, In Press), http://dx.doi.org/10.9781/ijimai.2017.03.012.

Boulid, Y., Souhar, A., & Elkettani, M. E. (2017). Handwritten Character Recognition Based on the Specificity and the Singularity of the Arabic Language. International Journal of Interactive Multimedia and Artificial Intelligence,4(4), 45-53.

Downloads

Published

2017-12-01
Metrics
Views/Downloads
  • Abstract
    48
  • PDF
    18

How to Cite

Souhar, A., Boulid, Y., Ameur, E. B., and Ouagague, M. M. (2017). Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform. International Journal of Interactive Multimedia and Artificial Intelligence, 4(6), 96–102. https://doi.org/10.9781/ijimai.2017.08.002