Taking the 'D' Out of CRUD with Soft Deletes
If you are like me, you cringe every time you see
.destroy in someone's ORM code. It is called destroy for a reason. For most businesses, your data is one of your most important assets. In the healthcare environment, we actually have a legal obligation to NOT destroy data, and we must keep tidy audit logs, as well. So, why would you ever want to destroy your data? Okay, so maybe you want to clean out old logs or other cruft, but you should never delete data that is core to your business.
Compelling reasons to use soft deletes
- Preserve history
- Know when the entity was deleted
- Know who deleted the entity
- Undo deletions without resorting to backups
Assuming you are convinced that destroying your data is bad, let us move on to implementation.
The common approaches
The idea in both of these approaches is to flag the row as deleted. Next, all queries are filtered by the column indicating deletion. This works pretty well at a basic level.
1. Add a boolean
is_removed column, defaults to
Pretty simple solution, just update the row and set
true. This solution is nice because the column can be
NOT NULL, however, you will want some indication of when this happened, so I would recommend adding a timestamp column as well.
An example schema might look like this:
CREATE TABLE customer ( customer_id serial PRIMARY KEY, name varchar(256) NOT NULL UNIQUE, version_on timestamptz NOT NULL DEFAULT current_timestamp, created_on timestamptz NOT NULL DEFAULT current_timestamp, is_removed boolean NOT NULL DEFAULT(false) );
2. Add a nullable
removed_on timestamp column
This solution uses the nullability of the timestamp column to represent deletion. This solution is nice since it indicates when the row was deleted, yet is only a single column. This is the method most ORM plugins use.
As far as ORMs go, there is the Paranoia gem for ActiveRecord and if you happen to use DataMapper, there is also Paranoia. Really though, that is a terrible name use.
par·a·noi·a: A mental condition characterized by delusions of persecution, unwarranted jealousy, or exaggerated self-importance, typically elaborated into an organized system.
One problem that both of these solutions have is handling unique constraints. Let's look at an example in postgres.
CREATE TABLE customer ( customer_id serial PRIMARY KEY, name text NOT NULL UNIQUE, removed_on timestamptz NULL );
Now, let's add some customers.
INSERT INTO customer (name) VALUES ('dunder mifflin'), ('vance refrigeration');
Next, we delete one.
UPDATE customer SET removed_on = current_timestamp WHERE name = 'vance refrigeration';
Now, if we try to add
vance refrigeration back, we get a duplicate key violation error:
ERROR: duplicate key value violates unique constraint "customer_name_key" Detail: Key (name)=(vance refrigeration) already exists.
To fix this in Postgres, we can use a partial index that only applies the unique constraint to the rows where
removed_on IS NULL.
So, we update our schema as follows, removing the inline
UNIQUE and adding the partial index:
CREATE TABLE customer ( customer_id serial PRIMARY KEY, name text NOT NULL, removed_on timestamptz NULL ); CREATE UNIQUE INDEX customer_name_index ON customer(name) WHERE removed_on IS NULL;
Now, we can soft delete and re-add customers without violating our unique constraint.
This is also possible in SQL Server by using filtered indexes in version 2008 or later.
So, we are using soft deletes. We now need to make sure that we never return deleted rows unless that is explicitly what we intend. With the paranoia gems, they try to handle this for you.
In SQL, we just need to add a
WHERE clause to our queries or an
AND clause to our joins.
SELECT * FROM customer WHERE removed_on IS NULL;
SELECT * FROM product p INNER JOIN customer c ON p.customer_id = c.customer_id AND c.removed_on IS NULL;
Looking forward to postgres 9.5, it should be possible to use row-level security policies to restrict access to these deleted rows entirely. This will greatly simplify the way we select and join these rows.
If you need to undo a deletion, now it is a simple matter of updating the
removed_on column back to
UPDATE customer SET removed_on = NULL WHERE name = 'vance refrigeration';
While it takes some extra work, it can be worth it depending on your business needs. For those of us in healthcare, this is definitely worth it.
Check out the next post in this series, where we go one step further by extending this concept to soft updates.