Overview
Many Smartbox.ai users have documents which have been scanned or hand-written. These range from medical forms, legal/commercial documents, handwritten letters, to email footers.
In technical terms, a computer would consider these to be images which don’t contain readable text – making it impossible to find any sensitive information in them. However, often these files do contain important sensitive information which needs to be identified and redacted.
Smartbox.ai will automatically apply Optical Character Recognition (OCR) to these files to ‘read’ the writing on them and analyse them just like any other document.
Accuracy
The OCR uses cutting-edge AI to recognise letters on the page. Although there will always be challenges with some handwriting, such as where some people get sloppy about fully writing certain letters in cursive (there is often a habit to skip over some of them in a hurry), generally the accuracy of Smartbox.ai’s OCR is extremely high compared to the market in general.
Smartbox.ai OCR, welcomes cursive writing, at an angle, in the margins. It uses the world’s most advanced AI-powered OCR engine to find and read all text in the files.
How it works
Simply drag & drop your data into Smartbox.ai and it will take care of the rest. The AI will automatically OCR all supported file types which don’t contain regular text in them.
No setup, configuration, or any other work is required.
Technical specification
1. Supported file types
- PNG
- JPEG
- TIFF
TIFF and PDF files are only OCRed if they do not already have a text layer present.
2. Quality
Although there is no lower limit, the better the scan quality, the better the OCR output will be - ideally at least 150 DPI.
3. Language Support
Although OCR will work on all European alphabet languages, it performs best on:
- English
- Spanish
- Italian
- Portuguese
- French
- German
(Note: please see Smartbox.ai analysis language support separately for detection of sensitive information)
4. Limits
- Unlimited number of pages.
- The maximum page height and width is 40 inches and 2880 points.
- PDFs cannot be password protected.
- Vertical text alignment is not supported.
- The minimum height for text to be detected is 15 pixels. At 150 DPI, this would be the same as 8-point font.
- Entity highlighting in Smartview is not currently supported.
Screenshots
In Smartbox.ai we use OCR in several ways;
1. See the pure-text output of the OCR – shown here side-by-side with the original document. This helps to understand what the analysis is basing its findings on.
2. Redaction: text is analysed by the AI to find sensitive information and redacted accordingly.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article