Explain Codes LogoExplain Codes Logo

Split comma separated column data into additional columns

sql
best-practices
performance
functions
Nikita BarsukovbyNikita Barsukov·Dec 1, 2024
TLDR

To instantly transform a comma-separated string into distinct columns, you can use SQL's STRING_SPLIT for SQL Server 2016+:

SELECT value FROM STRING_SPLIT((SELECT YourColumn FROM YourTable), ',');

For older versions or different databases, a recursive CTE can achieve similar results:

WITH RecursiveCTE AS ( SELECT LEFT(YourColumn, CHARINDEX(',', YourColumn + ',') - 1) AS Piece, STUFF(YourColumn, 1, CHARINDEX(',', YourColumn + ','), '') AS RemainingParts FROM YourTable UNION ALL SELECT LEFT(RemainingParts, CHARINDEX(',', RemainingParts + ',') - 1), STUFF(RemainingParts, 1, CHARINDEX(',', RemainingParts + ','), '') FROM RecursiveCTE WHERE RemainingParts > '' ) SELECT Piece FROM RecursiveCTE;

Just replace YourColumn and YourTable with your actual table and column names and voila! As they say, when STRING_SPLIT isn't on your menu, recursive CTE is the main course!

Exploring other flavors

Time to explore some more exotic dishes in our SQL kitchen!

Meet split_part() from PostgreSQL

PostgreSQL offers the split_part() function, which slices apart a string based on a delimiter and returns the nth substring:

SELECT split_part(YourColumn, ',', 1) AS col1, -- Fishing the first fish out of my comma sea split_part(YourColumn, ',', 2) AS col2, -- Surfing to the second island in the comma ocean ... FROM YourTable;

But beware, prying for an nth substring in an empty string returns an empty string. Be cautious of the submarine traps in our comma sea!

array-fying columns with string_to_array()

PostgreSQL provides a nifty function, string_to_array, which saves you from hard-coding positions:

SELECT string_to_array(YourColumn, ',')[1] AS col1, -- Getting the first sip of comma soup string_to_array(YourColumn, ',')[2] AS col2, -- Diving for the second pearl in the comma oyster ... FROM YourTable;

‘CSV in a column?’, the SQL puritans gasp in horror. Yes, it’s not the best practice. Consider a schema redesign to normalize this data, or else face the wrath of maintainability demons.

DIY functions

Wrap up all of your logic into a custom function for later use:

CREATE FUNCTION split_csv (input_string text, delimiter text, part_number integer) RETURNS text AS $$ BEGIN RETURN split_part(input_string, delimiter, part_number); -- My precious function does my dirty work! END; $$ LANGUAGE plpgsql IMMUTABLE; -- Usage SELECT split_csv(YourColumn, ',', 1) FROM YourTable;

DIY functions – a snazzy way to keep your code clean and handsome!

Awareness of performance

Performance is a beast that waits around the corner. Be aware. Heavy usage of split methods could lead to performance downfall. Always test these methods against large data sets and may all the odds be in your favor!

Possible pitfalls

Happy encounters with empty strings

When splitting strings, empty strings might tag along, especially with irregular CSV columns. Be ready for this cheeky encounter!

Array manipulation tools to the rescue

regexp_split_to_array is useful when delimiters are tricky:

SELECT (regexp_split_to_array(YourColumn, ', *'))[1] AS col1, -- Sweeping for the first comma dust (regexp_split_to_array(YourColumn, ', *'))[2] AS col2 -- Shaking the second comma tree FROM YourTable;

Remember, a comma followed by spaces can be a wee bit tricky. Stay vigilant!

Performance side-effects

While functions like regexp_split_to_array are power tools, they might come at the expense of performance. As they say, with great power, comes great responsibility!