OCR

The OCR (Optical Character Recognition) module is used for finding individual characters or words in the image.

The OCR is trainable to support even special fonts (embossed, dotted, etc.), rotated texts, good performance on more variable backgrounds, and other things that general pre-trained OCR models can't usually offer.

 

showcase.png
Example of OCR on car tire (3D scanned)

Training

Annotations

For training, you draw rectangles in the image and can rotate them using the square connected to the top side of the rectangle. When a rectangle is selected, you can write the text it contains to the text field on the right.

Pekat also shows a small warning if some annotations have an empty text.

 

After you’ve filled in the texts, you need to click the Split button to split the annotations into individual characters (also spaces if your text contains them) - this splits the text into equally-sized rectangles.

 

If your text is in the monospace font (each character has the same width), you probably won't need to do anything with the character annotations, but for other fonts, you should adjust the character rectangles to fit each character properly.

 

Training settings

In the training settings, we set the number of training epochs for detection and classification. If your model doesn't find the text position well, increase the number of detection epochs. If the position is right, but the individual characters are misclassified as different ones, increase the number of classification epochs.

You can also extend an already trained model.

After inference Pekat shows you the text it has found in the image with a percentage confidence of each detection. The model is capable of recognizing whole sets of words together (see image) as well as only individual characters.

 

Evaluation

Evaluation is done based on regular expressions (regex) - each found word/letter is tested, whether it matches the regex. If the regex is found at least in one word/letter, the image is marked as True.

Making several rules and groups, each finding a different text is possible. To learn more, see