Soft and Hard Delete: everything you need to know

For years, I have developed many web applications, and unfortunately, users make mistakes. Among them, deleting records from the database that have a lot of value.

When I started managing these issues, I discovered that there was a way to logically delete records from the database without having to do so physically, allowing me to recover deleted records without much effort.

What are hard delete and soft delete?

Hard delete, as its name suggests, involves a physical removal of the record. Once this action is executed, the record completely disappears from the system, and the only way to recover it is through a backup, which can result in a complicated and potentially error-prone process.

Soft delete, instead of permanently removing the record, simply marks it as “deleted” by updating a specific attribute. As a result, the record remains present in the database, but at a logical level, it is deleted.

How to apply soft delete

There are many ways to implement soft delete. The most common are:

  • Adding the is_deleted column ⇒ if the record is deleted, is_deleted is true.
  • Adding the deleted_at or deleted_date column ⇒ if the record is deleted, the column will contain the timestamp of the moment of deletion.

Benefits of soft delete

Data recovery

The ability to mark records as deleted instead of physically removing them allows for data recovery in case of errors or restoration needs. Simply unmark the record as deleted and voila, we can recover a deleted item without needing to resort to a backup.

Some processes may need the information of a deleted item (graceful deletion)

Sometimes, a process needs to consult records that have been deleted. If we apply a hard delete, this is complicated, as we would need to resort to an auxiliary table from which to consult these data. However, with soft delete, it is much simpler, all we have to do is ignore the attribute that determines whether a record is deleted or not.

Maintaining referential integrity and avoiding cascade deletion

By keeping records in the database, soft delete can help maintain referential integrity in systems with relationships between tables. Moreover, even if foreign keys are not used, deleting a record whose references may be scattered throughout the code can introduce errors.

Problems with soft delete

Difficulty in recovering data

But… hadn’t we said that one of the advantages was to recover the deleted data?

In my experience, this is not always easy. Recovering a deleted record can leave some parts of the system inconsistent ( for example, projections built from events).

Before recovering a record, it is necessary to determine what logical actions must be implemented to support the recovery of deleted items.

Undesired access to deleted data

Data marked as deleted can still be accessible through direct database queries, so it is crucial to manage these queries adequately.

In my case, I always opt to include by default the condition to not retrieve deleted records. This strategy means that, to recover deleted records, I must do so explicitly. This helps reduce the errors that developers (including myself) can make if they forget that soft delete is being used in a certain database schema.

Impact on performance

Although logically the records may be marked as deleted, physically they still exist in the database. This implies that, while our application can distinguish between both types of records, our database engine will not. As a result, all configured indexes will also apply to the records marked as deleted, which can generate performance issues if our table is very massive.

Regulation

I am not an expert in this area, but I do know that regulations like the GDPR require you to delete all records of a user if they request it. To comply with this requirement, it is necessary to implement a hard delete process that ensures the physical deletion of the user’s data from the database.

Problems with constraints

When implementing soft delete, it is important to remember what we have said several times: it is a logical deletion and not a physical one. Our database engine continues to store the record. The problem is found when creating/updating a record where, due to restrictions imposed by constraints, it conflicts with a deleted element.

For example, given a unique key on a user’s email, the following can occur:

  1. A user with email X is created ⇒ the record is added to the database.
  2. The user with email X is deleted ⇒ the record is marked as deleted.
  3. Another user with email X is created ⇒ there is already a user with the same email in the database, an error occurs.

To solve this problem, you must use partial unique indexes. Each database engine defines them in a different way, so depending on your infrastructure, you may have some limitations to apply them, especially if you use an ORM. We will talk more in detail about partial unique indexes in another post.

When to use soft delete

In my case , I pose two key questions to determine whether to implement a soft delete or not:

  • Do we need a graceful deletion?
  • Are we going to want to recover the deleted data?

The second question is very important, and I suggest discussing it with the product team. Although the answer is probably affirmative, it is essential to consider if the loss of a deleted record could cause serious problems.

In general, it is recommended to apply a soft delete in critical parts of the application where data loss could be problematic for users. For example, on GitHub, deleting a repository by mistake could result in a significant loss of value, so applying a soft delete might be interesting.

If any of these questions receive a positive answer or if there are referential integrity restrictions, then it is appropriate to implement soft delete. Otherwise, a hard delete may be more convenient and faster.

But then… What do we do if we need to have the deleted data for an audit and/or tracking? This topic will be addressed in another post.