Robust and fast text‐line extraction using local linearity of the text‐line

Hideaki Goto, Hirotomo Aso

Text region extraction is a necessary process before character recognition is done for document images. This paper describes a new algorithm, Linear Segment Linking (LSL), for text‐line extraction from document images. The algorithm groups together the piecewise linear elements in the document images, which may be assumed to be text lines, and then extracts them from the images. The algorithm requires less knowledge about document structure and is robust for distortion of the image. The primitive rectangles are introduced for the intermediate representation of image. It is easier and faster to create them than the usual circumscribing rectangles. A method of splitting the bridges between neighboring text lines is proposed. Combining the bridge splitting process with the text line extraction, the locally touching text lines will be extracted as individual ones.

Original languageEnglish
Pages (from-to)21-31
Number of pages11
JournalSystems and Computers in Japan
Issue number13
Publication statusPublished - 1995


  • Linear segment linking
  • bridge splitting
  • document image analysis
  • primitive rectangle
  • text line extraction


