VIETNAMESE TEXT EXTRACTION FROM BOOK COVERS
DOI:
https://doi.org/10.37569/DalatUniversity.7.2.234(2017)Keywords:
Book cover, OCR (Optical Character Recognition), Text information extraction, Vietnamese text detection.Abstract
Automatic information extraction from images reduces the cost, human interference, and timely processing. Converting printed book covers to readable text for later automation process would be useful for a wide range of users such as librarians, bookshop keepers, and individual users. In this paper, we present a novel method for the Vietnamese text extraction from images of scanned book covers. The proposed system accepts the book covers snapshot, filters the input image for an enhancement of quality, locates the regions with text, then utilizes the optical character recognizer (OCR) to extract the text. The last step is to filter the extracted text in accompany with at dictionary to achieve the final text result. Carrying out the experiments with the proposed system using our dataset delivered encouraging experimental results.Downloads
References
Chen, D. M., Tsai, S. S., Vedantham, R., Grzeszczuk, R., & Girod, B. (2009). Streaming mobile augmented reality on mobile phones. Paper presented at The IEEE International Symposium on Mixed and Augmented Reality, USA.
Chowdhury, A. (2016). Bangla character recognition for Android devices. International Journal of Computer Applications, 136(11), 13-19.
Gatos, B., & Pratikakis, I. (2005). Text detection in indoor/outdoor scene images. Paper presented at The First Workshop of Camera-Based Document Analysis and Recognition, Spain.
Hasnat, M. A., Chowdhury, M. R., & Khan, M. (2009a). An open source Tesseract based optical character recognizer for Bangla script. Paper presented at The International Conference on Document Analysis and Recognition, Spain.
Hasnat, M. A., Chowdhury, M. R., & Khan, M. (2009b). Integrating Bangla script recognition support in Tesseract OCR. Paper presented at The Conference on Language and Technology, Spain.
Matsushita, K., Iwai, D., & Sato, K. (2011). Interactive bookshelf surface for in situ book searching and storing support. Paper presented at The 2nd Augmented Human International Conference on - AH ’11,Japan.
Rosner, D., Boiangiu, C., Zaharescu, M., & Bucur, I. (2014). Image skew detection: A comprehensive study. Paper presented at The Third International Workshop on Cyber Physical Systems, Romania.
Sobottka, K., Bunke, H., & Kronenberg, H. (1999). Identification of text on colored book and journal covers. Paper presented at The Fifth International Conference on Document Analysis and Recognition,Spain.
Srihari, S. N., & Govindaraju, V. (1989). Analysis of textual images using the Hough transform. Machine Vision and Applications, 2(3), 141-153.
Too, K. B., & Prabhakar, C. J.(2016). Extraction of scene text information from video. International Journal of Image, Graphics and Signal Processing, 8(1), 15-26.
Yadav, N. (2015). Algorithm for automatic text retrieval from images of book covers. (Master Thesis), Thapar University, India.
Zhong, Y., Karu, K., & Jain, A. K. (1995). Locating text in complex colour images. Pattern Recognition, 28(10), 1523-1535.
Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19-36.
Downloads
Published
Volume and Issues
Section
Copyright & License
Copyright (c) 2017 Phan Thị Thanh Nga, Nguyễn Thị Huyền Trang, Nguyễn Văn Phúc, Thái Duy Quý, Võ Phương Bình
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.