We need to talk about soft delete.

We need to talk about soft delete.

There are things we just don't talk about, soft delete is one of them.

Across the globe, data is being 'deleted' when a user asks for it to be removed, but what is really happening? Soft delete. This is when the data is no longer accessible to the user, it appears to have been deleted, but in truth, its still there, in its entirety. Deep in the database, there is a column named 'Deleted', with a 1 (true) or a a 0 (false).

Some people are so scared about losing valuable, precious data that to delete it, in its entirety seems... impossible.

Why so scared?

The information age we live in means data is insight, insight is advantage and advantage is money. Ok, that's a little reductive, but you get my gist.

Data === Money.

Therefore, it logically follows that deleting data is removing access to an asset, this is where people struggle.

What is the law (Personal data)?

The hottest topic is personal data, your name, email address, physical address, telephone number etc.

In the EU, GDPR rules supreme, the 'right to be forgotten' (what a great turn of phrase by the way) exists (https://gdpr-info.eu/art-17-gdpr/):

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies:

There follows a list of reasons you can have your personal data deleted, one of which is you 'withdraw consent', i.e. you don't want a company to have it any more.

What are the exceptions?

One of the exceptions is below, its the one I think pretty much everyone is clinging on to as an excuse for not deleting data, and we need to talk about this more openly.

(e) for the establishment, exercise or defence of legal claims.

You can imagine the conversations: "Yeah, we simply can't delete the user data, because, you know, if they sue us, we'll need to have something to rely on to come back at them with".

Many companies are using this clause, incorrectly, to essentially never delete any personal data, ever. This data is then included in every report, data analysis and insight, and simply put, it shouldn't be.

Of course, not everyone is doing this, but I'm guessing, if you got this far down into the article its because deep inside you know you've worked somewhere that does this, and you weren't comfortable with it then, or now. (This is your soft delete soul talking.)

What about non-personal data?

There was a recent high profile example of where soft-delete would have been useful: We lost 54 thousand GitHub stars. In short, a full delete of data happened when it was clear that someone could have activated the process in error, this is the perfect example of where soft delete would be great, consider it a cooling off period. Soft delete is great for the old "whoopsie, didn't mean to do that".

However, if you start doing this, then do you allow companies to indefinitely store your personal data, just in case?

Instagram were found to have 'accidently' kept hold of images for over a year after users deleted them, citing a 'bug'.

What about backups?

I can keep the data in backups right... right? Nope. If someone has asked for the data to be deleted, it is reasonable for them to expect that if your database has some catastrophic issue and you have to restore from backups, that their data doesn't do the old 'Lazirus' and come back from the dead.

What should I do?

Think about the end user, their rights to what happens to their data, what you need to actually protect yourself from legal claims (which ironically, you could be invalidating your own defence against legal claims if you are not deleting personal data when requested).

Also think about when undoing an action would be, you know, actually super useful.

Yeah, its complicated.