To become more familiar with the term metadata, imagine a simple example: You send a letter through the mail. Now the document contained in the envelope corresponds to the actual, primary data. This data is private and protected by law against access by third parties – the secrecy of correspondence applies.
The envelope contains the metadata of the letter. This is additional data that accompanies the primary data:
- Address and sender
- Stamp and post mark
- Where required, additional identifiers like bar codes
As you can see, all in all it is data that makes the sending of the letter possible in the first place. The metadata of the letter is visible to anyone. This means that it is not specially protected by the secrecy of correspondence, although postal secrecy does apply.
So, what is the danger posed by metadata? It’s not a problem if individual metadata can be read. If, for example, a third party gained knowledge of the existence of an individual envelope, it’s usually no cause for concern. However, this changes when more data is at stake, as is the case with massive data storage and its evaluation. On a larger scale, patterns emerge that reveal a lot about a person’s behavior: Who communicated with whom and when? Networks and chains of communication can be identified.
The distinction between data and metadata is fluid. The classification depends on the context and on perspective. Here’s another example. A book contains primary data, such as the title of the book and its content. Furthermore, a set of metadata is available for the publication of a book:
- Author
- Publisher
- Time and place the book was published
- Edition
- ISBN
Let’s imagine that the metadata of many publications is collected in a database. Regarding this kind of a database, the publication information would be primary data. In addition, there would be a new set of metadata for each publication. For example, for each publication, the database could store when an entry was added and by which user.