OCR

The OCR (Optical Character Recognition) module is used for finding individual characters or words in the image.

The OCR is trainable to support even special fonts (embossed, dotted, etc.), rotated texts, good performance on more variable backgrounds, and other things that general pre-trained OCR models can't usually offer.

Example of OCR on car tire (3D scanned)

Training

Annotations

For training, you draw rectangles in the image and can rotate them using the square connected to the top side of the rectangle. When a rectangle is selected, you can write the text it contains to the text field on the right.

PEKAT also shows a small warning if some annotations have an empty text.

After you’ve filled in the texts, you need to click the Split button to split the annotations into individual characters (also spaces if your text contains them) - this splits the text into equally-sized rectangles.

If your text is in the monospace font (each character has the same width), you probably won't need to do anything with the character annotations, but for other fonts, you should adjust the character rectangles to fit each character properly.

Training settings

In the training settings, we set the number of training epochs for detection and classification. If your model doesn't find the text position well, increase the number of detection epochs. If the position is right, but the individual characters are misclassified as different ones, increase the number of classification epochs. Adding more annotations will help with both.

You can adjust the Training Data Split to specify how many images are to be used for training and how many for testing. The images are split pseudo-randomly - the images are split using random number generator initiated using the seed. This means that if you use the same seed, the images will be split the same.

The recommended value for the split is 80% for training and 20% for testing.

You can also extend an already trained model.

After inference PEKAT shows you the text it has found in the image with a percentage confidence of each detection - the confidence is equal to the lowest confidence of a character in that detected string (for example if you have test ‘of’ and 'o' is detected with confidence of 90% and 'f' with 70% the whole string will have confidence of 70%).

The model is capable of recognizing whole sets of words together (see image) as well as only individual characters.

Model Validation

After training, you can check the Model Statistics - you will see a table where each row represents a string from annotations of a testing image. The table has two additional columns - OK column shows how many annotations are detected correctly, NG column how many are detected incorrectly. You can click on the numbers in those columns to filter the corresponding images.

Evaluation

Evaluation is done based on regular expressions (regex) - each found word/letter is tested, whether it matches the regex. If the regex is found at least in one word/letter, the image is marked as True.

Making several rules and groups, each finding a different text is possible. To learn more, see Evaluation

Evaluation using only string (1. rule) and a regex (2. rule)