VIETNAMESE TEXT EXTRACTION FROM BOOK COVERS

Authors

  • Phan Thị Thanh Nga Faculty of Information Technology, Dalat University
  • Nguyễn Thị Huyền Trang Faculty of Information Technology, Dalat University
  • Nguyễn Văn Phúc Devsoft Company
  • Thái Duy Quý The Research Management and International Cooperation Department, Dalat University
  • Võ Phương Bình Faculty of Information Technology, Dalat University

DOI:

https://doi.org/10.37569/DalatUniversity.7.2.234(2017)

Keywords:

Book cover, OCR (Optical Character Recognition), Text information extraction, Vietnamese text detection.

Abstract

Automatic information extraction from images reduces the cost, human interference, and timely processing. Converting printed book covers to readable text for later automation process would be useful for a wide range of users such as librarians, bookshop keepers, and individual users. In this paper, we present a novel method for the Vietnamese text extraction from images of scanned book covers. The proposed system accepts the book covers snapshot, filters the input image for an enhancement of quality, locates the regions with text, then utilizes the optical character recognizer (OCR) to extract the text. The last step is to filter the extracted text in accompany with at dictionary to achieve the final text result. Carrying out the experiments with the proposed system using our dataset delivered encouraging experimental results.

Metrics

Metrics Loading ...

References

Chen, D. M., Tsai, S. S., Vedantham, R., Grzeszczuk, R., & Girod, B. (2009). Streaming mobile augmented reality on mobile phones. Paper presented at The IEEE International Symposium on Mixed and Augmented Reality, USA.

Chowdhury, A. (2016). Bangla character recognition for Android devices. International Journal of Computer Applications, 136(11), 13-19.

Gatos, B., & Pratikakis, I. (2005). Text detection in indoor/outdoor scene images. Paper presented at The First Workshop of Camera-Based Document Analysis and Recognition, Spain.

Hasnat, M. A., Chowdhury, M. R., & Khan, M. (2009a). An open source Tesseract based optical character recognizer for Bangla script. Paper presented at The International Conference on Document Analysis and Recognition, Spain.

Hasnat, M. A., Chowdhury, M. R., & Khan, M. (2009b). Integrating Bangla script recognition support in Tesseract OCR. Paper presented at The Conference on Language and Technology, Spain.

Matsushita, K., Iwai, D., & Sato, K. (2011). Interactive bookshelf surface for in situ book searching and storing support. Paper presented at The 2nd Augmented Human International Conference on - AH ’11,Japan.

Rosner, D., Boiangiu, C., Zaharescu, M., & Bucur, I. (2014). Image skew detection: A comprehensive study. Paper presented at The Third International Workshop on Cyber Physical Systems, Romania.

Sobottka, K., Bunke, H., & Kronenberg, H. (1999). Identification of text on colored book and journal covers. Paper presented at The Fifth International Conference on Document Analysis and Recognition,Spain.

Srihari, S. N., & Govindaraju, V. (1989). Analysis of textual images using the Hough transform. Machine Vision and Applications, 2(3), 141-153.

Too, K. B., & Prabhakar, C. J.(2016). Extraction of scene text information from video. International Journal of Image, Graphics and Signal Processing, 8(1), 15-26.

Yadav, N. (2015). Algorithm for automatic text retrieval from images of book covers. (Master Thesis), Thapar University, India.

Zhong, Y., Karu, K., & Jain, A. K. (1995). Locating text in complex colour images. Pattern Recognition, 28(10), 1523-1535.

Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19-36.

Downloads

Published

28-06-2017

Volume and Issues

Section

Natural Sciences and Technology

How to Cite

Nga, P. T. T., Trang, N. T. H., Phúc, N. V., Quý, T. D., & Bình, V. P. (2017). VIETNAMESE TEXT EXTRACTION FROM BOOK COVERS. Dalat University Journal of Science, 7(2), 142-152. https://doi.org/10.37569/DalatUniversity.7.2.234(2017)