Explain Codes LogoExplain Codes Logo

Postgresql: Case insensitive string comparison

sql
database-indexing
postgresql-extensions
case-insensitive-comparison
Nikita BarsukovbyNikita Barsukov·Dec 14, 2024
TLDR

For straightforward case-insensitive comparison in PostgreSQL, use either the ILIKE operator or the LOWER() function:

ILIKE example:

-- John, JOhn, jOhn, whatever... JOHN always finds a way! SELECT * FROM table WHERE name ILIKE 'john';

LOWER example:

-- Lower...are we limbo dancing or database coding? SELECT * FROM table WHERE LOWER(name) = 'john';

Both options normalize case providing seamless, efficient case-insensitive comparisons.

Diving deeper: Citext, indexing, and escaping

The Citext superhero

Harness CITEXT, PostgreSQL's native case-insensitive data type. It does the hard work so you don't have to:

-- Run CITEXT extension CREATE EXTENSION IF NOT EXISTS citext; -- Now let's create a table with a CITEXT column CREATE TABLE users ( email CITEXT PRIMARY KEY, name VARCHAR(100) );

With CITEXT, case-insensitive comparisons are as easy as pie:

-- Let's sneak in John's detail INSERT INTO users (email, name) VALUES ('[email protected]', 'John Doe'); -- This fetches John whether he is [email protected] or [email protected] SELECT * FROM users WHERE email = '[email protected]';

Indexing: Speed is the need

Why wait when you can race ahead? Index the LOWER() expression or tap the power of trigram index:

-- Lower indexing running on coffee CREATE INDEX idx_lower_name ON users (LOWER(name)); -- A wild trigram approach appears CREATE EXTENSION pg_trgm; CREATE INDEX idx_gin_name ON users USING gin (name gin_trgm_ops);

Need for speed met. And what's that? Wildcards!

Meeting wildcards in your search parameters? Use replace() to escape text:

-- When PostgreSQL decided to play hide and seek SELECT * FROM users WHERE replace(name, '!', '!!') ILIKE '%john%';

For keeping your collation game strong, use the COLLATE keyword:

-- Altering tables, not your ambitions ALTER TABLE users ALTER COLUMN email TYPE CITEXT COLLATE "en-US-x-icu";

Untangling complexity: Types, constraints, and collation

Dancing with data types

Implicit type casting with CITEXT can make your life much easier. You don't have to explicitly cast your comparisons:

-- CITEXT doing the waltz with data types SELECT * FROM users WHERE email = '[email protected]';

Playing with unique constraints

Looking for faster, more compact indexes? Resort to trigram indices. They also support unique constraints like a pro:

-- Your query performance on a diet CREATE UNIQUE INDEX idx_unique_email_trgm ON users USING gin (email gin_trgm_ops);

Mastering collation

Use ICU collations for accurate comparisons and sorting:

-- Deterministic? Yep, we've got that too! CREATE COLLATION ci_collation (provider = 'icu', locale = 'en-u-ks-primary', deterministic = false); ALTER TABLE users ALTER COLUMN name TYPE VARCHAR COLLATE ci_collation; -- Check this out! Through the collation lens SELECT * FROM users ORDER BY name COLLATE "en-US-x-icu";