The goal is to localize artificial text in images extracted from videos. Artificial text is designed to be read easily by the observer, in contrast to other types of text that may appear in scenes, which are not specially designed to be read (eg. text on t-shirt, billboard traffics, signs). Therefore, artificial texts are long, horizontal, not occulted, and highly contrasted regions.
The plan described here is focused on detection. Therefore there is not a real elimination of false detection. Results might be refined afterwards using knowledge on the application such as text height or height/width ratio to discard spurious candidate regions.
The input image (Wolf, 2003). | The result image. |
The method is based on the assumption that artificial texts are regions characterized by high density of vertical edges. TIt is composed of 5 steps:
The image is converted to grayscale image since color is not an intrinsic feature of the texts. Then, the image is smoothed with a Gaussian filter to attenuate the noise and the vertical gradient is computed using morphological operators: the difference between horizontal dilation and horizontal erosion of the image:
g(x,y) = δV(f(x,y)) - εV(f(x,y))
pany2pan input.png input.pan prgb2gray 0.299 0.587 0.114 input.pan tmp1.pan pgaussianfiltering 0.5 tmp1.pan tmp2.pan plineardilatation 0 0 1 tmp2.pan tmp3.pan plinearerosion 0 0 1 tmp2.pan tmp4.pan psub tmp3.pan tmp4.pan text1.pan | |
text1.pan: After vertical gradient computation. |
The binarization is used to keep only highly contrasted vertical structures. The threshold is calculated from the entropy value which is well adapted to detect small objects on a homogeneous background.
Practically, the threshold value is determined as the gray level value that maximizes the total amount of information provided by the background and the objects separately. The amount of information is measured by entropy.
pentropybinarization text1.pan text2.pan | |
text2.pan: After the binarization by entropy. |
Before the detection, we use a vertical closing to reconnect the vertical lines since the vertical gradient produces discontinuous lines for lines not strictly straight.
Then, a horizontal closing is used to merge together close edges into one unique region. The closing uses a horizontal structuring element of half-size 4 which merges edges that are at a maximum distance of 9 pixels.
plineardilatation 90 0 2 text2.pan tmp5.pan plinearerosion 90 0 2 tmp5.pan text3-1.pan plineardilatation 0 0 4 text3-1.pan tmp7.pan plinearerosion 0 0 4 tmp7.pan text3-1.pan | |
text3-1.pan: After the vertical closing. | |
text3-3.pan: After the horizontal closing. |
A vertical closing of half-size 2 pixels is used to make the detected regions more compact.
plineardilatation 90 0 2 text3-1.pan tmp8.pan plinearerosion 90 0 2 tmp8.pan text4.pan | |
text4.pan: After the vertical closing. |
The next step removes too small regions, namely the regions of half-width < 2 pixels and of surface area < 100 pixels.
plinearerosion 0 0 2 text4.pan tmp9.pan plineardilatation 0 0 2 tmp9.pan tmp10.pan pareaopening 8 100 tmp10.pan text5.pan | |
text5.pan: After too small regions removal. |
The convex hull is calculated for each region. Then, the regions that have not a rectangular convex hull are eliminated (ie., rectangularity factor < 70%).
plabeling 8 text5.pan tmp11.pan pconvexhull tmp11.pan tmp12.pan prectangularityselection 1 .70 tmp12.pan text6.pan | |
text6.pan: After non rectangle regions removal. |
Convex hulls with less than 21 pixels width are eliminated since artificial text are quite wide regions.
plinearerosion 0 0 10 text6.pan tmp13.pan plineardilatation 0 0 10 tmp13.pan text7.pan | |
text7.pan: After too thin convex hull removal. |
The localization is done with the bounding boxes that are then dilated in order to include all the character pixels in the boundary boxes.
pboundingbox text7.pan tmp14.pan pdilatation 1 2 tmp14.pan text8.pan | |
text8.pan: The resulted bounding boxes |
This operation is just added for the sake of visualization.
pboundary 8 text8.pan tmp15.pan psuperimposition 1 input.pan tmp15.pan output.pan ppan2png output.pan output.png | |
output.pan: The result image. |