Explain Codes LogoExplain Codes Logo

Splitting string into multiple rows in Oracle

sql
string-splitting
oracle-sql
data-refinement
Alex KataevbyAlex Kataev·Mar 7, 2025
TLDR

In Oracle, we can split a string into rows using the CONNECT BY LEVEL technique. The following example will break down a comma-separated list:

SELECT SUBSTR(txt, INSTR(txt, ',', 1, LEVEL) + 1, INSTR(txt, ',', 1, LEVEL + 1) - INSTR(txt, ',', 1, LEVEL) - 1) AS value FROM (SELECT 'apple,orange,banana' AS txt FROM dual) CONNECT BY LEVEL <= LENGTH(txt) - LENGTH(REPLACE(txt, ',')) + 1 AND PRIOR SYS_GUID() IS NOT NULL AND PRIOR txt IS NOT NULL;

LEVEL is your loyal friend that tracks the element count, SUBSTR & INSTR teamed up to carve out the slices of words, and CONNECT BY is the ultimate ringmaster that conducts the show until all elements are elegantly split.

Techniques for string splitting aficionados

Efficient word chopping with XMLTable

When your string chopping gets complex or you're in pursuit of performance, XMLTable might be the knife you are looking for. Acting as a robust XML parser, it can bestow some impressive SQL superpowers:

SELECT trim(column_value) as value FROM XMLTable(('"' || REPLACE(:csv, ',', '","') || '"'));

The REPLACE function wraps each word with quotes, and then XMLTable function treats the resulting string as XML, chopping it into neatly separated words.

Handy PL/SQL function for regular string bashing

If string splitting has become your daily bread, binding the above splitting operation in a PL/SQL function could be a time-saver and a nice addition to your toolbox:

CREATE OR REPLACE FUNCTION split_string ( p_list IN VARCHAR2, p_delim IN VARCHAR2 := ',' ) RETURN sys.OdciVarchar2List PIPELINED IS BEGIN FOR i IN ( SELECT REGEXP_SUBSTR(p_list, '[^' || p_delim || ']+', 1, LEVEL) as element FROM dual CONNECT BY REGEXP_SUBSTR(p_list, '[^' || p_delim || ']+', 1, LEVEL) IS NOT NULL ) LOOP PIPE ROW(i.element); END LOOP; RETURN; END split_string;

Now, call this function whenever you need to do the string splitting, and voilà, you got a reusable and efficient string splitter at your disposal.

Splitting with the MODEL clause because reasons

The MODEL clause, a mighty SQL feature that simulates spreadsheet-like operations, can be utilised for manipulating data sets and splitting strings, because who doesn't like to emulate Excel in SQL:

WITH dataset AS ( SELECT 'Wilson,Field,Gates' AS csv FROM dual ) SELECT value FROM dataset MODEL DIMENSION BY (0 as id) MEASURES (csv as value, 0 as start_pos, 0 as end_pos) RULES ITERATE (1000) /* Because we like to push the limits */ ( start_pos[ITERATION_NUMBER] = CASE ITERATION_NUMBER WHEN 0 THEN 1 ELSE end_pos[ITERATION_NUMBER-1]+2 END, end_pos[ITERATION_NUMBER] = INSTR(csv[0], ',', start_pos[ITERATION_NUMBER]), value[ITERATION_NUMBER] = CASE WHEN end_pos[ITERATION_NUMBER] > 0 THEN SUBSTR(csv[0], start_pos[ITERATION_NUMBER], end_pos[ITERATION_NUMBER] - start_pos[ITERATION_NUMBER]) ELSE SUBSTR(csv[0], start_pos[ITERATION_NUMBER]) END ) WHERE value IS NOT NULL ORDER BY id;

The MODEL clause is like an army Swiss knife that allows you to slice and dice your data into multiple dimensions. You can use it to allocate positions of each comma in your string, simulate an iterative process, and extract data segments.

Oracle guru insights ft. Tom Kyte's str2tbl function

Reflecting on the str2tbl function by Oracle's dim star Tom Kyte, a paraphrased version is provided below for learning purposes:

CREATE OR REPLACE TYPE stringlist AS TABLE OF VARCHAR2(4000); CREATE OR REPLACE FUNCTION str2tbl(p_str IN VARCHAR2) RETURN stringlist PIPELINED IS l_string LONG := p_str || ','; l_comma_index PLS_INTEGER; l_index PLS_INTEGER := 1; BEGIN LOOP l_comma_index := INSTR(l_string, ',', l_index); EXIT WHEN l_comma_index = 0; PIPE ROW(SUBSTR(l_string, l_index, l_comma_index - l_index)); l_index := l_comma_index + 1; END LOOP; RETURN; END str2tbl;

Masterfully crafted, this function uses the PIPELINED functionality of PL/SQL to deliver string splitting that's as smoothe as a hot knife through butter.

Real-world data refinement techniques

To retain the correlation of the original data when dealing with multi-column scenarios, Cross Joining the split resultset with the original data provides contextual richness:

WITH data AS ( SELECT 1 AS id, 'Error1,Error2,Error3' AS errors FROM dual ) SELECT d.id, t.column_value AS error FROM data d CROSS JOIN TABLE(str2tbl(d.errors)) t;

Maintain quality control on data by trimming spaces with TRIM and using DISTINCT to remove duplicates post-splitting:

SELECT DISTINCT TRIM(REGEXP_SUBSTR(str, '[^,]+', 1, LEVEL)) as Word FROM (SELECT 'Error1, Error2, Error1, Error3' AS str FROM dual) CONNECT BY REGEXP_SUBSTR(str, '[^,]+', 1, LEVEL) IS NOT NULL;

Properly handling spaces and duplicates is like throwing a fancy party - it's all in the details. You wouldn't want unwanted guests (duplicates) or messy playrooms (whitespaces), would you?