UUID for better data integrity

Let's talk a about UUID today. UUID stays for Universally Unique Identifier, it's just 128 bits that are used to encode ID of some entity. In relation databases like PostgreSQL or MySQL it's usually more common to use sequentially incremented integer for IDs, rather than UUID.

I often observe, teams turn to UUID only when they run into some scenarios, where IDs comes from from outside (a UI client or another service) and it can not be generated within DB to be sequential. Another use case for UUID is to hide real number of database entries in REST URLs.

However, they is another strong use case for UUID.

Problem

Let's say there is a simple DB schema of 3 tables: resources, admins, and users. Table resources has a foreign key admin_id that refers to admins.id.

 +-------------------+      +----------------+     +----------------+
 | resources         |      | admins         |     | users          |
 +-------------------+      +----------------+     +----------------+
 | id INT (PK)       |  +-->| id INT (PK)    |     | id INT (PK)    |
 +-------------------+  |   +----------------+     +----------------+
 | name  VARCHAR     |  |   | name VARCHAR   |     | name VARCHAR   |
 +-------------------+  |   +----------------+     +----------------+
 | admin_id INT (FK) |--+
 +-------------------+

Let's say we have 2 admins: Donald and Mickey.

id name
1 Donald
2 Mickey

And 3 users: Billy, Willy, Dilly.

id name
1 Billy
2 Willy
3 Dilly

We want to add a new resource Pancake but our application has a bug: by mistake we're adding user's id as admin_id, instead of admin's id:

Resource.create!(name: 'Pancake', admin_id: willy.id)

Because willy.id is 2, the database's FK constraint will check presence of record with ID=2 in admins table, it will find the Mickey record, so the Pancake resource will be successfully created.

But it's obviously not want we want. We want the DB to yell at us loudly, pointing to the error in our application logic.

UUID to rescue

This kind of scenarios could be eliminated if we use UUID type to represent IDs, instead of sequential integers. The key difference here is that each single instance of UUID is going to be unique globally over all tables, not only within a scope of one single table.

With UUID for primary and foreign keys, the content of our tables may look like this:

admins:

id name
ef16123e-b1ab-11eb-8529-0242ac130003 Donald
d273a69d-5904-47f2-bdc6-86e34120e0a2 Mickey

users:

id name
68295a9c-6c4e-4a34-820f-757aad0efac2 Billy
dc619aa3-44e0-40aa-a63c-6c7a1122683a Willy
1e1fcc91-8b6b-492a-9bc1-0e969a296681 Dilly

Now if our application tries to create a resource with incorrect reference to admins table:

Resource.create!(name: 'Pancake', admin_id: willy.id)

Our DB will spit out an error, telling us that FK constraint is not satisfied, because there is no record with id = dc619aa3-44e0-40aa-a63c-6c7a1122683a in table admins.

Afterword

UUID takes a twice more disk space (128 bits for UUID VS 64 bits for BIG INT), but I think it's a very good bargain, considering the benefits it brings. However, you can still strengthen data integrity even using sequential integers.

One technique is to allocate a big enough range of possible values for each single table. For example:

Ideally ID values of different tables would never overlap. The very first admin record would have id = 1_000_000_001 instead of 1.

Another technique, that can be applied only in the languages with advanced type system, is to use wrapper types for each single id. For example in Rust it would look like:

struct UserId(i64);
struct AdminId(i64);
struct ResourceId(i64);

In this case the compiler would never allow us to use UserId, where AdminId is expected.

Thanks you for reading.