Tokenize Strings in PL/SQL
Several ways of pulling strings apart

With around 20 years on the job, Matt is one of the most experienced software developers at Pretius. He likes meeting new people, traveling to conferences, and working on different projects.
He’s also a big sports fan (regularly watches Leeds United, Formula 1, and boxing), and not just as a spectator – he often starts his days on a mountain bike, to tune his mind.
When pulling delimited strings apart in PL/SQL (or SQL statements in these examples) you'd usually use something like this:
WITH test_data AS (SELECT 'Diego Javier Llorente Ríos' as full_name FROM dual)
SELECT SUBSTR(full_name, 1, INSTR(full_name, ' ') - 1) as word_1,
SUBSTR(full_name, INSTR(full_name, ' ')+1, INSTR(full_name, ' ', 1, 2) - INSTR(full_name, ' ') - 1) as word_2,
SUBSTR(full_name, INSTR(full_name, ' ', 1, 2)+1, INSTR(full_name, ' ', 1, 3) - INSTR(full_name, ' ', 1, 2) - 1) as word_3,
SUBSTR(full_name, INSTR(full_name, ' ', 1, 3)+1) as word_4
FROM test_data;
This would produce:

It's standard practice to use something like the above.
The SUBSTR function is used to extract a specified number of characters from a string, starting from a given position.
The INSTR function, on the other hand, returns the position of a specified character within a string.
By using these two functions in combination, it is possible to extract specific information from a string, such as a first name or last name.
However, I always despair when I see how much code it produced. Just look at it! you can see, the number of lines of code required to extract all 4 words has increased and it becomes harder to read and understand.
Alternatively, the REGEXP_SUBSTR function can be used to extract specific information from a string using a regular expression pattern.
For example, to extract all 4 words from the string 'Diego Javier Llorente Ríos' and store them in separate columns, the following SQL statement can be used:
WITH test_data AS (SELECT 'Diego Javier Llorente Ríos' as full_name FROM dual)
SELECT REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1) AS word_1,
REGEXP_SUBSTR(full_name, '[^ ]+', 1, 2) AS word_2,
REGEXP_SUBSTR(full_name, '[^ ]+', 1, 3) AS word_3,
REGEXP_SUBSTR(full_name, '[^ ]+', 1, 4) AS word_4
FROM test_data;

As you can see, the regular expression method is much more concise and easy to understand. It only requires one line of code to extract all 4 words.
It would be great if Oracle could provide a built-in function that makes this task even easier and more efficient.
In the meantime, I have built a handy PLSQL function to do just this. Which can also be used in SQL statements.
FUNCTION f_split (
p_str IN VARCHAR2,
p_sep IN VARCHAR2 DEFAULT apex_application.LF,
p_occurance IN VARCHAR2 DEFAULT 1,
p_limit IN PLS_INTEGER DEFAULT NULL )
RETURN VARCHAR2 DETERMINISTIC
IS
-- Variables
l_tb apex_t_varchar2;
l_return VARCHAR2(32767) DEFAULT NULL;
BEGIN
BEGIN
l_tb := apex_string.split (
p_str => p_str,
p_sep => p_sep,
p_limit => p_limit );
EXCEPTION
WHEN OTHERS
THEN
-- Log something if you wish
NULL;
END;
IF l_tb.exists(p_occurance)
THEN
l_return := l_tb(p_occurance);
END IF;
RETURN l_return;
END f_split;
To be honest, apex_string.split can be used directly in PL/SQL. However, because it uses a PLSQL table object type it cannot be used in SQL without a wrapper such as the above.
Let's give it a go:
WITH test_data AS (SELECT 'Diego Javier Llorente Ríos' as full_name FROM dual)
SELECT f_split(full_name, ' ', 1) AS word_1,
f_split(full_name, ' ', 2) AS word_2,
f_split(full_name, ' ', 3) AS word_3,
f_split(full_name, ' ', 4) AS word_4
FROM test_data;

Enjoy






