Smartbox.ai OCR Datasheet

Posted almost 3 years ago by Mounia Kechtam

Overview

Many Smartbox.ai users have documents which have been scanned or hand-written. These range from medical forms, legal/commercial documents, handwritten letters, to email footers.

In technical terms, a computer would consider these to be images which don’t contain readable text – making it impossible to find any sensitive information in them. However, often these files do contain important sensitive information which needs to be identified and redacted.

Smartbox.ai will automatically apply Optical Character Recognition (OCR) to these files to ‘read’ the writing on them and analyse them just like any other document.

Accuracy

The OCR uses cutting-edge AI to recognise letters on the page. Although there will always be challenges with some handwriting, such as where some people get sloppy about fully writing certain letters in cursive (there is often a habit to skip over some of them in a hurry), generally the accuracy of Smartbox.ai’s OCR is extremely high compared to the market in general.

Smartbox.ai OCR, welcomes cursive writing, at an angle, in the margins. It uses the world’s most advanced AI-powered OCR engine to find and read all text in the files.

How it works

Simply drag & drop your data into Smartbox.ai and it will take care of the rest. The AI will automatically OCR all supported file types which don’t contain regular text in them.

No setup, configuration, or any other work is required.

Technical specification

1. Supported file types

PNG
JPEG
TIFF
PDF

TIFF and PDF files are only OCRed if they do not already have a text layer present.

2. Quality

Although there is no lower limit, the better the scan quality, the better the OCR output will be - ideally at least 150 DPI.

3. Language Support

Although OCR will work on all European alphabet languages, it performs best on:

English
Spanish
Italian
Portuguese
French
German

(Note: please see Smartbox.ai analysis language support separately for detection of sensitive information)

4. Limits

Unlimited number of pages.
The maximum page height and width is 40 inches and 2880 points.
PDFs cannot be password protected.
Vertical text alignment is not supported.
The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8-point font.
Entity highlighting in Smartview is not currently supported.

Screenshots

In Smartbox.ai we use OCR in several ways;

1. See the pure-text output of the OCR – shown here side-by-side with the original document. This helps to understand what the analysis is basing its findings on.

Text, letter

Description automatically generated

Text

Description automatically generated

2. Redaction: text is analysed by the AI to find sensitive information and redacted accordingly.

Text

Description automatically generated with low confidence