Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Offline scripting-free author identification based on speeded-up robust features
Sharma M., Dhaka V. International Journal on Document Analysis and Recognition18 (4):303-316,2015.Type:Article
Date Reviewed: Feb 19 2016

Text-independent writer identification (writer ID) has been studied for many years, and more and more techniques are now proposed to deal with various scripts and writing conditions. In general, there are two types of feature extraction: histogram-based statistical features that capture local slope information along the contours of handwritten blobs, and grapheme-based codebook features that capture local shape similarity between handwritten primitives.

This paper proposes to use speeded-up robust features (SURF) to describe scale and local orientation within each word region. Following a number of pre-processing techniques such as text line segmentation, binarization, word segmentation, and overlapping words splitting, the authors used the EM algorithm to cluster SURF descriptors into N categories and use them as a codebook for follow-up feature extraction. The feature extraction consists of two parts: SURF descriptor signatures and histogram features. A SURF signature is essentially a normalized vector with each element representing the Euclidean distance between the descriptor and each codebook component. Histogram features count the number of SURF descriptors in a histogram that is based on X octaves and Y sublevels in each octave, thus giving (X * Y) scales by using SURF. However, the authors did not specify the dimensionality of the histogram features.

The experimental evaluation is extensive and impressive. The authors evaluated their approach on eight public datasets including five English ones, one Chinese, and two hybrid ones (ICDAR 11 and ICFHR 12 competition datasets). Using both hard and soft measures, the authors compared their approach with others in the literature and showed impressive absolute gains. Although these absolute gains seemed substantial, it is still scientific to see some statistical significance tests on all of the claimed gains. Anyway, I think people focusing on writer ID will be interested in looking into the proposed feature extraction approach.

From a completely different perspective, I am a bit concerned about the sequence of pre-processing steps, each of which modifies the original image without preserving necessary information for image recovery or integrity check. For example, binarization is expected to modify handwritten strokes, which might be critical for extracting local features for writer ID. It would be interesting to see how the propagation of algorithmic errors in word segmentation impacts the follow-up feature extraction.

Reviewer:  Jin Chen Review #: CR144177 (1605-0355)
Bookmark and Share
Document Analysis (I.7.5 ... )
Feature Evaluation And Selection (I.5.2 ... )
Would you recommend this review?
Other reviews under "Document Analysis": Date
Generating indicative-informative summaries with sumUM: a 3D dynamic virtual shop
Saggion H., Lapalme G. Computational Linguistics 28(4): 497-526, 2002. Type: Article
Jun 20 2003
Parameter-Free Geometric Document Layout Analysis
Lee S., Ryu D. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11): 1240-1256, 2001. Type: Article
Jul 26 2002
A hierarchical neural network document classifier with linguistic feature selection
Chen C., Lee H., Hwang C. Applied Intelligence 23(3): 277-294, 2005. Type: Article
Aug 2 2006

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy