FAQs

Posted 2 months ago by Harrison Gowers

 

Q: Why is my highlighting/redaction off by a few pixels? 

 

Explanation: Usually it’s caused by an OCR process that’s been run by a scanner, which has inserted the text at slightly incorrect positions. Smartbox attempts to accommodate for slight misalignments by adding a few ‘padding’ pixels to its bulk redactions, but sometimes it’s just a bit too far out. 

 

Solution: You can fix this by uploading the non-OCR version of the document, Smartbox will run its own OCR on the file which will accurately position the text. 

 

Q: Why can’t Smartbox just fix the files as they’re uploaded?

 

Explanation: Smartbox does in fact accommodate for many deficiencies that are commonly caused by scanners and corrects them automatically. However, PDFs can have very complex structures which Smartbox can’t resolve, or the OCR misalignments can be too subtle for Smartbox to auto-detect. 

 

Solution: In these cases you will need to manually resolve the issue with the file. 

 

Q: Can’t you just OCR all PDF files? 

 

Explanation: OCR is a slow process and may take several seconds per file. Smartbox processes millions of files per day and performing OCR on all of them would slow down your document analysis considerably. Our goal is to get you a fully analysed dataset as quickly as possible. 

 

Solution: Smartbox will list documents which have warning signs (such as broken fonts or misaligned boxes) in the error report for you to examine. However, we must stress the importance of reviewing files prior to disclosure to ensure the alignments are correct. 

 

Q: Why is there highlighting/redaction of empty space on the file?

 

Explanation: Sometimes the file will have hidden content, usually when someone has drawn a white box over a section of the page. Smartbox will still identify this content and highlight it. Similarly, it may be hidden text from an OCR process which is dramatically misaligned with the scanned image. 

 

Solution: for hidden content which does not need redacting, simply select the redactions and remove them in Smartview. If the OCR process is dramatically misaligned, upload the non-OCR version of the file and Smartbox will correctly OCR it and position the words as appropriate.



<<< Warning explanations

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article