Pseudonymisation and anonymisation

If you work with personal data and intend to publish or disclose it, there are a number of things you need to bear in mind. A common approach is to pseudonymise or anonymise the data before further processing or publication.

On this page, you can find out what the differences between these methods are and what needs to be considered when handling pseudonymous or anonymous data. It also explains what metadata is and how it can compromise pseudonymisation or anonymisation. At the bottom of the page, you will also find some guidelines on how to technically prepare data for publication or sharing.

Secure data processing is extremely important. A negative example that has been widely reported recently is the publication of the so-called Epstein files by the US Department of Justice. In some cases, documents were published that had not been sufficiently redacted, thereby revealing the identities of victims.

You can test for yourself in the next paragraph that, in some cases, you don’t even need technical tools to make improperly redacted text legible again. Simply highlight the black bar and the text becomes legible. And perhaps you’ll find even more text on this page that hasn’t been adequately concealed.

This text is still legible!

Even white text on a white background does not provide adequate protection!

  • Pseudonymisation

    In pseudonymisation, the characteristics that can be used to identify a person are replaced by an identifier. The identifiers are still linked to the individuals on a separate list.

    • Only the list makes it possible to know which person is ‘hidden’ behind which identifier.
    • Without the list, no conclusions can be drawn about the data subject.
    • If the pseudonymisation list is completely and irrevocably deleted, the data is generally anonymised.

    Examples: code numbers or altered names (=pseudonyms) in data collection

    Pseudonymised data is considered personal data! It falls within the scope of the General Data Protection Regulation (GDPR). Anyone processing pseudonymised data must therefore comply with data protection regulations!

    Pseudonymisation is defined in Article 4(5) of the GDPR as:

    "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person"

  • Anonymisation

    To anonymise data, all characteristics that enable the identification of individual persons (e.g. names) must be deleted or rendered sufficiently unrecognisable. It must be possible to reliably rule out the theoretical possibility of identification.

    However: People often possess additional knowledge that nevertheless makes it possible to identify specific individuals. For example, if they know the individuals very well personally, or if the individuals are public figures.

    When anonymising data, it must therefore be checked whether it can be reliably ruled out that data subjects can be identified on the basis of additional knowledge. Even if it seems unrealistic that anyone would attempt to do so.

    Correct and irreversible anonymisation is of paramount importance, because:

    • Anonymised data is no longer considered personal data! It therefore falls outside the scope of the GDPR.
    • Consequently, no data protection regulations need to be observed when processing anonymous data.
    • Images, videos, voice recordings and handwritten notes relating to individuals are generally not considered anonymous. Extensive measures must be taken to anonymise such personal data. Minor measures, such as a black bar over the eyes in a photograph of a person, are not sufficient to prevent identification.

Metadata

What is metadata?

Put simply, metadata is data about other data. A book provides a clear example: the content is the actual data; details such as the title, author, edition, year of publication, publisher and ISBN are the book’s metadata. Metadata can be used, amongst other things, for searching, organising or managing data. However, under certain circumstances, it may also allow conclusions to be drawn about personal data.

Metadata is also contained in many digital files such as images (e.g. time taken, GPS coordinates) or documents (e.g. author, date edited).

Why must attention also be paid to metadata during pseudonymisation and anonymisation?

Metadata contained in datasets or files can, particularly when combined with other data, result in anonymised or pseudonymised data being re-attributed to individual persons.

The decision as to which metadata is unobjectionable under data protection law and which should be removed prior to further processing or publication must be made on a case-by-case basis. The decisive factor here is whether there are scenarios in which the metadata could be used to identify individuals.

Once the decision to remove metadata has been made, the deletion must be carried out technically correctly and the data must not be recoverable. In the final section of this webpage, you will find some instructions on how to remove metadata from specific files. For other file types, you should check for yourself how metadata can be securely removed from them.

Instructions

These instructions provide only a selection of possible tools that can be used to securely redact data. It is also possible to use other tools or techniques.

After completing any process, you must check whether the result has been sufficiently and irreversibly redacted, or whether the desired metadata has been removed.

  • Redacting PDF documents

    Adobe Acrobat Pro, which is available free of charge to university staff, includes a function for redacting PDF documents. However, this must be carried out correctly to ensure that the redaction is technically irreversible. Please follow the manufacturer’s instructions for redacting with Adobe Acrobat Pro.

  • Redacting paper documents

    If sections of paper documents are crossed out with a black pen, the text is often still legible, for example by holding the sheet of paper up to a light or by scanning the text and applying digital filters to the document. This applies to text as well as to printed photos or screenshots.

    The State Data Protection Commissioner of Lower Saxony therefore recommends cutting out or covering up the sections to be redacted and then copying or scanning the document. This can also be used as an alternative to redacting with Adobe Acrobat Pro (print out -> cut out or cover up the text -> scan).

    You can find the guidance from the State Data Protection Commissioner of Lower Saxony here.

  • Removing metadata from images and documents

    Whether and what metadata is contained in a file depends on the file format. It is therefore not possible to cover all possibilities here.

    For images, you can follow these instructions from heise.de.

    For Word documents, information is available from Microsoft Support.

    For PDFs using Adobe Acrobat Pro, Adobe provides the following information.