Explain Codes LogoExplain Codes Logo

Split column into multiple rows in Postgres

sql
prompt-engineering
join
performance
Anton ShumikhinbyAnton Shumikhin·Dec 31, 2024
TLDR

Want to shred a column into multiple rows in Postgres? Use unnest(). When armed with a products table where the features column houses arrays, to split features into rows, do this:

SELECT p.id, unnest(p.features) AS feature FROM products p;

In this case, unnest() morphs each element of the features array into a separate row. Behold, the magic of converting array-based data into a multi-row format.

From delimiter hell to data heaven

Facing the hellfire of a delimited string? Bathe yourself in the cleansing waters of string_to_array() and unnest():

/* this is where commas come to die */ SELECT p.id, unnest(string_to_array(p.features, ',')) AS feature FROM products p;

This performs an exorcism on your string, splitting it into an array by a delimiter (e.g., a comma), then unnest() does its usual trick of transfiguring each array element into a row.

The power of lateral thinking

Whoever said SQL can't do gymnastics? Enter the world of lateral joins. Say you have a sales table with dates behaving badly and storing comma-separated dates, split these ruffians into well-behaved rows like so:

/* just some casual SQL gymnastics */ SELECT s.id, d.date FROM sales s CROSS JOIN LATERAL string_to_array(s.dates, ',') AS d(date);

CROSS JOIN LATERAL is SQL's Cirque du Soleil act, bending and twisting each dates value into multiple rows associated with each sales record.

Sentry duty - preserve original rows

Some rows are worth holding onto, even if there aren't any elements after splitting. In steps, the LEFT JOIN ... ON true knight in shining armour:

/* because who likes throwing out perfectly good rows? */ SELECT s.id, d.date FROM sales s LEFT JOIN LATERAL string_to_array(s.dates, ',') AS d(date) ON true WHERE d.date IS NOT NULL;

Now all your sales records will stand tall, with the date column displaying NULL for rows without a corresponding split value.

Beware the regex

Oy vey, need to slay a more complex splitting dragon? Arm yourself with regular expressions using regexp_split_to_table(). But be warned, young warrior, this is more computationally hefty and could slow down your query's steed.

/* splitting comments with the cunning art of RegEx, or RegEx-su if you will */ SELECT c.id, regexp_split_to_table(c.comments, E'\\s+') AS word FROM comments c;

Just like a ninja performing a split beneath a sword swipe, we use regular expressions to match those hard-to-reach patterns that simple delimiters can't.

Gracefully handling the empty

When unnest() encounters the void (an empty array or a NULL), it might panic and not return any rows at all. Keep its nerves in check with a reassuring COALESCE() function:

/* it's OK unkempt(), NULL doesn't bite, well, not today */ SELECT p.id, unnest(COALESCE(p.features, '{}')) AS feature FROM products p;

The COALESCE() function acts as unnest()'s life coach, reassuring it to see NULL values as an empty string, allowing unnest() to manifest rows and preserving your row structure.

Additional tips for a trouble-free journey

Filter out the riff-raff

After splitting into rows, you might want to root out certain results. Meet your new best friends, the WHERE and ORDER BY clauses.

/* Filter? We barely know her! */ SELECT p.id, unnest(p.features) AS feature FROM products p WHERE unnest(p.features) != 'UnwantedFeature' ORDER BY p.id, feature;

This will order your results and keep only the features you’re interested in.

NULL and void

Postgres has a distinctive way of handling NULL values. You need to be aware of this if you have NULL entries after the split.

Performance matters

And last but not least, to avoid witnessing your query crawl slower than a snail on a treadmill, consider creating indexes to help filter or sort the data faster. This can significantly boost your query's performance.