Smartbox.ai OCR Datasheet

Posted over 2 years ago by Mounia Kechtam

Overview

 

Many Smartbox.ai users have documents which have been scanned or hand-written. These range from medical forms, legal/commercial documents, handwritten letters, to email footers.

 

In technical terms, a computer would consider these to be images which don’t contain readable text – making it impossible to find any sensitive information in them. However, often these files do contain important sensitive information which needs to be identified and redacted.

 

Smartbox.ai will automatically apply Optical Character Recognition (OCR) to these files to ‘read’ the writing on them and analyse them just like any other document. 


Accuracy


The OCR uses cutting-edge AI to recognise letters on the page. Although there will always be challenges with some handwriting, such as where some people get sloppy about fully writing certain letters in cursive (there is often a habit to skip over some of them in a hurry), generally the accuracy of Smartbox.ai’s OCR is extremely high compared to the market in general. 

 

Smartbox.ai OCR, welcomes cursive writing, at an angle, in the margins. It uses the world’s most advanced AI-powered OCR engine to find and read all text in the files. 

 

How it works

 

Simply drag & drop your data into Smartbox.ai and it will take care of the rest. The AI will automatically OCR all supported file types which don’t contain regular text in them. 


No setup, configuration, or any other work is required. 

 

Technical specification


1. Supported file types

  • PNG
  • JPEG
  • TIFF
  • PDF

 

TIFF and PDF files are only OCRed if they do not already have a text layer present.

 

2. Quality


Although there is no lower limit, the better the scan quality, the better the OCR output will be - ideally at least 150 DPI.

 

3. Language Support


Although OCR will work on all European alphabet languages, it performs best on: 

  • English
  • Spanish
  • Italian 
  • Portuguese
  • French
  • German

 

(Note: please see Smartbox.ai analysis language support separately for detection of sensitive information)

 

4. Limits

  • Unlimited number of pages.
  • The maximum page height and width is 40 inches and 2880 points. 
  • PDFs cannot be password protected. 
  • Vertical text alignment is not supported.
  • The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8-point font.
  • Entity highlighting in Smartview is not currently supported.

 

Screenshots

 

In Smartbox.ai we use OCR in several ways; 

 

1. See the pure-text output of the OCR – shown here side-by-side with the original document. This helps to understand what the analysis is basing its findings on.

 

Text, letter

Description automatically generated

 

 

Text

Description automatically generated

 

2. Redaction: text is analysed by the AI to find sensitive information and redacted accordingly. 

 

Text

Description automatically generated with low confidence


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article