Warning explanations

Posted 2 months ago by Harrison Gowers

Warnings & explanations

 

Warning messageExplanation
Text falls outside the dimensions of the PDF.
The position left, right, upper or lower edges of the text falls outside the Media Box.

BBox / ArtBox / BleedBox / CropBox / TrimBox dimensions are not within the page dimensions.


The top, left, upper or lower dimensions of the box falls outside the dimensions of the Media box. It is not a requirement to have these boxes in the PDF, however if they are present they may be used for positioning of elements on the document and as such may have an impact if the dimensions are incorrect. 



PDF has encrypted content.


The PDF contains content which is encrypted and could not be rendered for display, and as such will be missing from the document Smartview / redacted export. 


PDF does not have a Document Catalogue.


That the internal PDF Catalogue used to structure the content contained within the PDF is missing. This is not necessarily an essential part of rendering the file contents but if it’s missing this may indicate that the program used to create the PDF has not correctly structured other parts of the file.


PDF has Optional Content Groups.


This is a document which may contain hidden content which is not visible to the user but may result in highlighted sections which do not appear to correspond to any visible content - when in fact the content is in the file but hidden. This is also worth keeping in mind when disclosing unredacted copies of documents, as they may appear safe at a glance, but could contain more sensitive information embedded in the file. 


PDF has bookmarks.
The file contains bookmark links - these can be references to external files which may not be included in the dataset. Any bookmarks in redacted documents will be removed in the rasterisation process of the document, but they may be included in unredacted files.

PDF has interactive forms.


The file contains content like buttons, links, videos, and other media. These will not be included in any rasterised copy of a redacted PDF but it’s important to keep this in mind when disclosing unredacted files. 


PDF has embedded files.


The file contains embedded files which will not be embedded as part of the file in rasterised redacted copies of the file but may be included in non-redacted copies of the file.


PDF has Article Threads.


The file contains information which links text boxes in the PDF in a specific order to aid readers in going through articles or stories. This information is hidden from the user and will not be part of any redacted document, but may be included in non-redacted copies of documents.

 

PDF has interactive elements.
The file contains content like buttons, links, videos, and other media. These will not be included in any rasterised copy of a redacted PDF but it’s important to keep this in mind when disclosing unredacted files.

Page [X] does not have a MediaBox defined.


The Media Box is missing from the PDF file on the indicated page (where [X] will be a page number). This can be a fatal issue which prevents any media being displayed at all and otherwise may cause automated redactions or highlighting to be incorrectly positioned on the file.


PDF contains non-embedded font.


The font that the PDF wants to use is not included in the PDF itself - this typically happens when a PDF has been converted from another file format in which an unusual font was being used and the converting program has not included that font in the PDF. This may cause the PDF not to display correctly in our Smartview, and may also cause errors when opening the files in viewers like adobe after exporting. 


PDF contains damaged font.


The font that the PDF wants to use is present but usually incomplete - this typically happens when a PDF has been converted from another file format in which an unusual font was being used and the converting program has not included all the necessary characters of the font, or they simply may not exist. This may cause the PDF not to display correctly in our Smartview, and may also cause errors when opening the files in viewers like adobe after exporting.


Text does not align with the image.


The file contains text which falls outside the image area of the PDF (usually the Media Box). Commonly this warning will show up together with the message Text falls outside the dimensions of the PDF but in some cases scanners produce incorrect file structures which cause the text to be within the media box but outside the content structure. This is an indicator that the structure of the PDF is incorrect and that the OCR text may be misaligned with the actual text. 


 


NB: it is possible for OCR text to be (sometimes dramatically, but usually quite subtly) misaligned from the image text without any indicators in the above that Smartbox can check for. This will result in misaligned redactions/highlighting which Smartbox cannot account for. In these cases, the best option is to simply upload the non-OCR copy of the same file - Smartbox will detect that the document has no text and will perform its own OCR, which will then accurately position the hidden text layer.




<<< PDF WarningsFAQs >>>

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article