Tesseract Page Segmentation Modes Explained

Tesseract Page Segmentation Modes Explained

Tesseract, one of the most popular OCR engines, offers a range of Page Segmentation Modes (PSMs) to handle different types of text layouts. Choosing the correct mode can significantly improve the accuracy of text recognition. In this guide, we'll explore Tesseract's PSMs in detail, providing insights and examples on how to use each mode effectively.

Let's delve deeper into each mode:

--psm 0

Orientation and Script Detection (OSD) only. This mode is exclusively for determining the text's orientation and script type, without performing any OCR. It's particularly useful for pre-processing images in multilingual OCR systems or when the text direction is unknown.

--psm 1

Automatic page segmentation with OSD. This mode combines the detection of text orientation and script with automatic page segmentation. However, it stops short of performing OCR. It's ideal for analyzing complex page layouts to understand how text is structured on the page.

--psm 2

Not implemented in Tesseract. Originally planned for automatic page segmentation without orientation or script detection, or OCR. Its potential use case would be for pre-analyzing the text layout in documents where orientation and script are already known.

--psm 3

Fully automatic page segmentation, but no OSD. The default mode in Tesseract, suitable for a broad range of documents. It automatically segments the page into different text areas, making it ideal for standard text pages without complex layouts or unknown text orientation.

--psm 4

Single column of text of variable sizes. Tailored for documents where text is arranged in a single column, but with varying font sizes and styles. It's particularly useful for brochures or web pages where text flow is linear but visually diverse.

--psm 5

Single uniform block of vertically aligned text. Designed for text blocks that are uniformly aligned in a vertical fashion. This mode is less commonly used but is essential for specific types of scripts or artistic text presentations.

--psm 6

Single uniform block of text. Assumes the entire image is a single block of text. Ideal for straightforward text recognition tasks in documents like posters, where the text is largely uninterrupted by images or complex layouts.

--psm 7

Single text line. Treats the entire image as if it contains just one line of text. This mode is perfect for extracting text from headers or footers, or reading text from narrow strips like labels.

--psm 8

Single word. Interprets the image as containing a single word. This mode is particularly effective for short text elements, such as reading names, signs, or isolated words in a larger image.

--psm 9

Single word in a circle. A unique mode for circular text layouts, often found in logos, emblems, or stamps. It's tailored to recognize text that follows a curved path.

--psm 10

Single character. Focuses on recognizing just a single character in an image. This mode is essential for tasks like deciphering individual letters or numbers, often used in captcha verification or single-character recognition in documents.

--psm 11

Sparse text. Designed to find and recognize text scattered throughout the image. This mode is beneficial for documents or images where text appears irregularly or in small chunks, like posters with text in multiple locations.

--psm 12

Sparse text with OSD. Combines the capabilities of sparse text recognition with orientation and script detection. It's useful for documents where text is not only sparse but also may be in different orientations or languages.

--psm 13

Raw line. This mode is for treating the image as a single text line while bypassing some of Tesseract's internal processing. It allows for more control over how text lines are interpreted, useful for advanced users or specific custom OCR tasks.

How to Use

Selecting the appropriate Page Segmentation Mode in Tesseract OCR is crucial for achieving high accuracy in text recognition. Whether dealing with simple documents or complex layouts, these modes offer the flexibility needed to tackle a wide range of OCR challenges.

To specify a Page Segmentation Mode, you use the --psm argument followed by the mode number. Here’s the basic syntax:

tesseract [input_file] [output_file] --psm [mode_number]
  • [input_file]: This is the path to the image file you want to process.

  • [output_file]: The path where the processed text will be saved.

  • [mode_number]: The number corresponding to the desired Page Segmentation Mode.

Suppose you have a JPEG image named document.jpg and you want to apply PSM 4 (single column of variable sized text). The command would be:

tesseract document.jpg output --psm 4

This command tells Tesseract to process document.jpg using PSM 4, and save the extracted text to output.txt (Tesseract adds the .txt extension by default).