What is the difference between "All" and "Contextual" bulk redaction?

Posted 2 months ago by Harrison Gowers

These two bulk redaction options differ in how they interact with terms that have or have not been classified by the AI analysis as personally identifiable information (PII). 

Terms that have been captured as PII will be highlighted in Smartview.


It is possible for 2 instances of the same word in the same dataset to be marked as both PII and not PII e.g. the word "hope" may differ depending on the context:

  • "Mrs Hope Bloggs" (PII)
  • "I hope you are well" (not PII)


It is in these situations that the bulk redaction options operate differently.


OPTIONEXPLANATIONEXAMPLE
Redact (All)
If a term is selected for bulk redaction, all instances of it will be redacted across the dataset regardless of whether it has been classified by the AI.

"Mrs Hope Bloggs" = redacted

"I hope you are well" = redacted

Redact (Contextual)

If a term is selected for bulk redaction, only the specific instances where it has been identified as personally identifiable information will be redacted in the dataset.

"Mrs Hope Bloggs" = redacted
"I hope you are well" = not redacted





Example:


In a box containing the two documents below, the word "Bond" has been highlighted as a name because it appears as the surname of "James Bond". However, the word "Bond" in the second document, meaning a type of security, isn't highlighted because it has not been considered personally identifiable.



Although switching the redaction option doesn't affect the redactions on the first document, it does determine whether anything is redacted or not on the second. If we select "Bond" in the bulk redaction and use the two different options we see the following results:


Redact (All)


Redact (Contextual)


As we can see, when redacting "All" every instance of "Bond" is redacted including in the second document where it means a security. 


Conversely, when redacting "Contextual" only the instances where the term has been highlighted - and therefore is deemed PII - are redacted.


Both methods have their advantages and may be more suitable in certain situations, so it is worth experimenting to find what works best.




Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article