OCR

The OCR (Optical character recognition) module is used for finding individual characters or words in the image.

The OCR is trainable in order to support even special fonts (embossed, dotted, etc.), rotated texts, good performance on more variable backgrounds, and other things which general pre-trained OCR models can't usually offer.

 

Example of OCR on car tire (3D scanned)

Training

Annotations

For training, you draw rectangles in the image and can rotate them using the square connected to the top side of the rectangle. When a rectangle is selected, you can write the text it contains to the text field on the right.

 

After you’ve filled in the texts, you need to click the Split button to split the annotations into individual characters (also spaces if your text contains them) - this splits the text into equally-sized rectangles.

If your text is in the monospace font (each character has the same width), you probably won't need to do anything with the character annotations, but for other fonts, you should adjust the character rectangles to fit each character properly.

Training settings

In the training settings, we set the number of training epochs for detection and classification. If your model doesn't find the text position well, increase the number of detection epochs. If the position is right, but the individual characters are misclassified as different ones, increase the number of classification epochs.

You can also extend an already trained model.

 

Evaluation

Evaluation is done based on regular expressions (regex) - each found word/letter is tested, whether it matches the regex. If the regex is found at least in one word/letter, the image is marked as True.