IMAGE Team - The Pantheon project

Tutorial: Detecting Artificial Text in Still Images



I. Objective

The goal is to localize artificial text in images extracted from videos. Artificial text is designed to be read easily by the observer, in contrast to other types of text that may appear in scenes, which are not specially designed to be read (eg. text on t-shirt, billboard traffics, signs). Therefore, artificial texts are long, horizontal, not occulted, and highly contrasted regions.

The plan described here is focused on detection. Therefore there is not a real elimination of false detection. Results might be refined afterwards using knowledge on the application such as text height or height/width ratio to discard spurious candidate regions.

The input image (Wolf, 2003).The result image.

II. Method

The method is based on the assumption that artificial texts are regions characterized by high density of vertical edges. TIt is composed of 5 steps:

  1. Compute the image of vertical gradients.
  2. Binarize the gradient image.
  3. Detect regions highly dense in vertical edges.
  4. Make the region more compact.
  5. Remove spurious regions.
  6. Localize detected regions inside bounding boxes.

II.1 Computing Vertical Gradient

The image is converted to grayscale image since color is not an intrinsic feature of the texts. Then, the image is smoothed with a Gaussian filter to attenuate the noise and the vertical gradient is computed using morphological operators: the difference between horizontal dilation and horizontal erosion of the image:

g(x,y) = δV(f(x,y)) - εV(f(x,y))
pany2pan input.png input.pan
prgb2gray 0.299 0.587 0.114 input.pan tmp1.pan

pgaussianfiltering 0.5 tmp1.pan tmp2.pan

plineardilatation 0 0 1 tmp2.pan tmp3.pan
plinearerosion 0 0 1 tmp2.pan tmp4.pan
psub tmp3.pan tmp4.pan text1.pan
text1.pan: After vertical gradient computation.

II.2 Selecting Highly Contrasted Vertical Structures

The binarization is used to keep only highly contrasted vertical structures. The threshold is calculated from the entropy value which is well adapted to detect small objects on a homogeneous background.

Practically, the threshold value is determined as the gray level value that maximizes the total amount of information provided by the background and the objects separately. The amount of information is measured by entropy.

pentropybinarization text1.pan text2.pan
text2.pan: After the binarization by entropy.

II.3 Detecting Regions Dense in Vertical Edges

Before the detection, we use a vertical closing to reconnect the vertical lines since the vertical gradient produces discontinuous lines for lines not strictly straight.

Then, a horizontal closing is used to merge together close edges into one unique region. The closing uses a horizontal structuring element of half-size 4 which merges edges that are at a maximum distance of 9 pixels.

plineardilatation 90 0 2 text2.pan tmp5.pan
plinearerosion 90 0 2 tmp5.pan text3-1.pan

plineardilatation 0 0 4 text3-1.pan tmp7.pan
plinearerosion 0 0 4 tmp7.pan text3-1.pan
text3-1.pan: After the vertical closing.
text3-3.pan: After the horizontal closing.

II.4 Making the Regions More Compact

A vertical closing of half-size 2 pixels is used to make the detected regions more compact.

plineardilatation 90 0 2 text3-1.pan tmp8.pan
plinearerosion 90 0 2 tmp8.pan text4.pan
text4.pan: After the vertical closing.

II.5 Removing Too Small Regions

The next step removes too small regions, namely the regions of half-width < 2 pixels and of surface area < 100 pixels.

plinearerosion 0 0 2 text4.pan tmp9.pan
plineardilatation 0 0 2 tmp9.pan tmp10.pan

pareaopening 8 100 tmp10.pan text5.pan
text5.pan: After too small regions removal.

II.6 Removing Not Rectangular Regions

The convex hull is calculated for each region. Then, the regions that have not a rectangular convex hull are eliminated (ie., rectangularity factor < 70%).

plabeling 8 text5.pan tmp11.pan
pconvexhull tmp11.pan tmp12.pan
prectangularityselection 1 .70 tmp12.pan text6.pan
text6.pan: After non rectangle regions removal.

II.7 Removing Too Thin Convex Hulls

Convex hulls with less than 21 pixels width are eliminated since artificial text are quite wide regions.

plinearerosion 0 0 10 text6.pan tmp13.pan
plineardilatation 0 0 10 tmp13.pan text7.pan
text7.pan: After too thin convex hull removal.

II.8 Localizing the Text Regions

The localization is done with the bounding boxes that are then dilated in order to include all the character pixels in the boundary boxes.

pboundingbox text7.pan tmp14.pan
pdilatation 1 2 tmp14.pan text8.pan
text8.pan: The resulted bounding boxes

II.9 Superimposition of the Bounding Boxes to the Original Image

This operation is just added for the sake of visualization.

pboundary 8 text8.pan tmp15.pan
psuperimposition 1 input.pan tmp15.pan output.pan
ppan2png output.pan output.png
output.pan: The result image.

III. The Complete Pandore Script

The Pantheon project
Image Team GREYC Laboratory
UMR CNRS 6072 - ENSICAEN - University of Caen, France
This page was last modified on 31 October 2013