PDF warnings

Posted 2 months ago by Harrison Gowers




PDF files may occasionally have structural issues that can cause unexpected behaviour. When these do occur this most often results in the misalignment of highlights and / or redactions in Smartbox. During the conversion process the system will attempt to rectify any discrepancies in the original file, but this is not always possible. 

If a PDF file has structural issues that could potentially affect its functionality the system will flag it and the Warning Report will list any affected files.


PDF structure


The PDF file format is designed to accommodate physical print, allowing for perforations around the edges, etc. To allow for these things PDFs contain a number of ‘boxes’, and critically, the positioning of the content of the document is relative to the position of the boxes. 


The structure of a PDF is as follows:


However, often scanners or PDF converters will actually set the box positions and sizes to fall outside the limits of the PDF itself (for example, we often see where a wide table doesn’t fit, so it appears ‘cut off’ - in these cases the text content position actually falls outside the Media Box and can’t be displayed). 


In most cases Smartbox will still make a best effort to perform highlighting and redaction of the document, but these may appear misaligned because the system has no way of knowing that the PDF structure is incorrect, and what the correct positioning should be. 


Frequently, OCR (optical character recognition) is run natively in scanners, or by plugins in other programs. This then embeds the detected text behind the scanned image. If the inserted OCR text does not align with the image text, this will impact Smartbox’s ability to accurately highlight/redact.


Most misalignments of highlighting/redaction are simply due to the OCR text being inaccurately positioned, either relative to the image content or to the Media Box. 


Warning explanations >>>


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article