Explain Codes LogoExplain Codes Logo

Use email address as a primary key?

sql
database-design
primary-key
database-performance
Nikita BarsukovbyNikita Barsukov·Dec 24, 2024
TLDR

Email addresses as primary keys are a no-go. Email addresses are mutable and represent private data. Choose surrogate keys, such as an auto-incrementing integer or a UUID, which remain immutable and uphold user privacy. Apply the unique constraint on the email field to uphold uniqueness without burdening it with primary key roles.

CREATE TABLE users ( id INT AUTO_INCREMENT PRIMARY KEY, -- numero uno, the real MVP email VARCHAR(255) UNIQUE -- "I'm unique too!" - email probably );

With this strategy, you uphold data integrity and follow database design best practices.

Is there a good time for an email as a primary key?

In rare, specific scenarios, an email address as a primary key could be justified. These are the odd times where application simplicity trumps other considerations, where joins and complex queries are minimal, and absolute uniqueness is a must.

Weighing up surrogate keys versus natural keys

Surrogates and naturals - what's the difference?

A head-to-head between surrogate keys (like an auto-incrementing integer) and natural keys (such email) opens up a debate on trade-offs. While surrogate keys remain oblivious to data attributes, natural keys bear mechanical implications which can render them volatile.

Complexity, meet surrogate key

As database complexity ramps up, replete with nuanced relationships, surrogate keys provide performant joining operations. The determinate, smaller size makes for quicker compares against the varying length of string types, like email addresses.

Change << cascade

An email, being a natural key, could require an update, triggering a cascade operation to related tables. It's an expensive operation that burdens the database with maintenance overheads, especially in high-volume ecosystems.

Performance on a string - it's complicated

Databases favor indexes on smaller, uniform keys, improving efficiency, though modern databases have loosened the performance gap between strings and integers. Even then, in most scenarios, integer keys should outpace their string counterparts, especially when properly indexed.

Pitfalls of email as primary key

Loose lips sink ships

Alluding to emails in a primary key context veers towards security and privacy hazards. Such sensitive data exposure not only precipitates privacy violations but can also unmask the system to security breaches, especially if identifiers are paraded in URLs.

Email changeover - chaos ensues

Users often change their emails. In turn, this triggers a need to update all corresponding records. It's a laborious process, adding a referential integrity layer riddled with pitfalls.

Stick to good old principles

Designing a database schema demands a keen eye on flexibility and maintainability. Upholding the principle of separation of concerns certainly pays here, ensuring that key identifiers are kept isolated from mutable data attributes like email, lending a scalable, resilient database design.

Not all is lost - alternatives to email as primary key

UUIDs - when integers aren't enough

When integers seem too mainstream, consider a UUID. They may be larger, but offer a unique identifier that can be generated client-side, eradicating the need for multiple roundtrips to the database.

Composite keys - two is better than one?

In circumstances where an email isn't alone, a composite key of both the email and partner attribute (like a domain or tenant ID) can offer uniqueness without pressing the email into primary key service.

Indexed fields - the best of both worlds

Indexes on an email field offer efficient retrieval and vouch for uniqueness, promoting emails to a secondary key role. It combines the benefits of efficient lookup with the cushion of a surrogate primary key.