Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Research on text location and recognition in natural images with deep learning
Zhang P., Shi Z., Gao H.  ICAAI 2018 (Proceedings of the 2nd International Conference on Advances in Artificial Intelligence, Barcelona, Spain, Oct 6-8, 2018)1-6.2018.Type:Proceedings
Date Reviewed: Jan 14 2021

A technical account, this paper reports on experiments carried out with combinations of deep learning techniques. The purpose of the proposed research is to explore and evaluate methods for text location and recognition in images.

Two approaches are attempted: 1) convolutional neural networks (CNN), recurrent neural networks (RNN), and connectionist temporal classifiers (CTC); and 2) CNN, RNN, and attention mechanisms. The text location method uses faster recurrent convolutional neural networks (RCNN) and mask RCNN. Several text recognition methods are presented: 1) DictNet and CharNet, CNN-based methods; 2) a CNN+RNN+CTC method with two layers of RNN; and 3) a CNN+RNN+attention-based method, including two models of CNN as part of an inception V3 network. Different endpoints in the network indicate different depths and structures of the CNN model.

Separate experiments are thoroughly conducted and described for both text location and text recognition. Other approaches are discussed, but the proposed text location method is not directly comparable with other state-of-the-art approaches. The proposed text recognition methods are also compared with each other, evaluated, and analyzed.

A well-written paper with interesting research results and discussions of related approaches, it is a good read for scholars, students, and professionals interested in deep learning approaches and/or automated text location and recognition in images.

Reviewer:  Mariana Damova Review #: CR147160 (2105-0125)
Bookmark and Share
  Featured Reviewer  
 
General (I.0 )
 
 
Feature Measurement (I.4.7 )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
A multi-modal approach for determining speaker location and focus
Siracusa M., Morency L., Wilson K., Fisher J., Darrell T.  Multimodal interfaces (Proceedings of the 5th international conference, Vancouver, British Columbia, Canada, Nov 5-7, 2003)77-80, 2003. Type: Proceedings
Mar 1 2004
Nanotechnology: science and computation (Natural Computing Series)
Chen J., Jonoska N., Rozenberg G., Springer-Verlag New York, Inc., Secaucus, NJ, 2006.  393, Type: Book (9783540302957)
Aug 2 2007
High performance computing for big data: methodologies and applications
Wang C., CRC Press, Inc., Boca Raton, FL, 2018.  286, Type: Book (978-1-498783-99-6), Reviews: (1 of 2)
Apr 4 2019
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy