If you have worked with SQL and PL/SQL for a while, you have probably been in a situation where you have a string that is delimited with a character, and you want the chunks of the string by themselves… In this tutorial we are going to look at a few ways to do this.
Introduction
In my previous post ( Quick Tip: Slicing A Long PLSQL String Into Smaller Pieces ), I kind of touched something similar, but there was no logic in this code that handled any delimiters, etc. We just brutally cut the string into pieces.
So, in this tutorial we are actually going to show how to pick out the pieces more carefully. Throughout the examples we are going to use a collection, SQL, the REGEXPR_SUBSTR function, SUBSTR, INSTR, and a PL/SQL Collection.
Ok, let’s get down to some business…
Example 1: Using SQL and REGEXP_SUBSTR
In this example we are only going to look at how to tokenize ONE single string, since that is the purpose of this tutorial. If you would like to see an example using REGEXPR_SUBSTR handling multiple records, please take a look at Alex Nuijten’s post found here.
For this example, here is the code:
SQL> set serveroutput on
SQL>
SQL> DECLARE
2 CURSOR cur IS
3 WITH qry AS (
4 SELECT 'Paris#London#Rome#Oslo#Amsterdam#New York' city_string
5 FROM dual
6 )
7 SELECT regexp_substr (city_string, '[^#]+', 1, ROWNUM) city
8 FROM qry
9 CONNECT BY LEVEL <= LENGTH(regexp_replace (city_string, '[^#]+')) + 1
10 ;
11
12 BEGIN
13 FOR rec IN cur LOOP
14 dbms_output.put_line('City:' || rec.city);
15 END LOOP;
16 END;
17 /
City:Paris
City:London
City:Rome
City:Oslo
City:Amsterdam
City:New York
PL/SQL procedure successfully completed
Ok, so then we’re done with the SQL part. Let’s jump over to PL/SQL code to do some cool stuff.
Example 2: Using a PL/SQL Collection
In this example we are going to tokenize our string and put it into a PL/SQL collection for later use.
SQL> DECLARE
2
3 TYPE typTokenTab IS TABLE OF VARCHAR(20) INDEX BY BINARY_INTEGER;
4 vglCityString VARCHAR2(200);
5
6 tglTokenTab typTokenTab;
7
8 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9 -- Function: creTokenList
10 --
11 -- This function takes a string with "tokens" delimited by pDelimiter
12 -- and put each "token" into a separate record in a PL/SQL collection. The
13 -- PL/SQL collection is returned back to the caller of the function.
14 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15 FUNCTION creTokenList(pLine IN VARCHAR2, pDelimiter IN VARCHAR2) RETURN typTokenTab IS
16 sLine VARCHAR2(2000);
17 nPos INTEGER;
18 nPosOld INTEGER;
19 nIndex INTEGER;
20 nLength INTEGER;
21 nCnt INTEGER;
22 sToken VARCHAR2(200);
23 tTokenTab typTokenTab;
24 BEGIN
25 sLine := pLine;
26 IF (SUBSTR(sLine, LENGTH(sLine), 1) <> '|') THEN
27 sLine := sLine || '|';
28 END IF;
29
30 nPos := 0;
31 sToken := '';
32 nLength := LENGTH(sLine);
33 nCnt := 0;
34
35 FOR nIndex IN 1..nLength LOOP
36 IF ((SUBSTR(sLine, nIndex, 1) = pDelimiter) OR (nIndex = nLength)) THEN
37 nPosOld := nPos;
38 nPos := nIndex;
39 nCnt := nCnt + 1;
40 sToken := SUBSTR(sLine, nPosOld + 1, nPos - nPosOld - 1);
41
42 tTokenTab(nCnt) := sToken;
43 END IF;
44
45 END LOOP;
46
47 RETURN tTokenTab;
48 END creTokenList;
49
50 BEGIN
51 vglCityString := 'Paris#London#Rome#Oslo#Amsterdam#New York';
52
53 tglTokenTab := creTokenList( vglCityString, '#');
54
55 FOR indx IN tglTokenTab.FIRST..tglTokenTab.LAST LOOP
56 dbms_output.put_line('City:' || tglTokenTab(indx));
57 END LOOP;
58
59 tglTokenTab.DELETE;
60 END;
61 /
City:Paris
City:London
City:Rome
City:Oslo
City:Amsterdam
City:New York
PL/SQL procedure successfully completed
Remember that when we are working with PL/SQL Collections, we need to clean up our mess. What I mean with this is that when we are playing with memory it is like cluttering up a child’s room with toys. We need to put the toys back into the box and into the closet again so that we can use the floor space for something else later. Hmm…to be technically correct we would need to put the toys in the box and give the box to the Salvation Army…but I don’t think we would have very happy children then. Think you get the point.
Ok, let’s move forward.
Example # 3: Using INSTR and SUBSTR to get a token at a certain position
So, let us say that you have a customer notes table with a field that always follows the same structure for how the customer service reps enter the notes. Let’s say it has the following format:
“<DATE>;<ORDER_ID>;<NOTE>#”
That would i.e. give us something similar to this in our table:
“2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection department.#2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction”
If we break down the notes field we will see that there are actually three different parts in the note:
- 2/5/2012;23412;Customer called reg. invoice
- 2/6/2012;23412;Contacted Lisa in the collection department.
- 2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction
Our task is to create a function that can get any of the three parts, AND get the individual elements of a single note.
Here is a sample of how this can be done:
SQL> set serveroutput on
SQL>
SQL> DECLARE
2 vglNotesString VARCHAR2(2000);
3
4 FUNCTION getToken(
5 pString IN VARCHAR2
6 ,pPosition IN INTEGER
7 ,pDelimiter IN VARCHAR2 DEFAULT ';'
8 ,pNullVal IN VARCHAR2 DEFAULT 'n/a'
9 ) RETURN VARCHAR2 IS
10 sLine VARCHAR2(2000);
11 iStartPos INTEGER;
12 iEndPos INTEGER;
13 iLength INTEGER;
14 vToken VARCHAR2(200);
15 bContinue BOOLEAN := TRUE;
16 BEGIN
17 -- Get start position
18 IF ( pPosition = 1) THEN
19 iStartPos := 1;
20 ELSIF ( pPosition < 0) THEN
21 iStartPos := INSTR(pString, pDelimiter, -1, ABS(pPosition)) + 1;
22 ELSE
23 iStartPos := INSTR(pString, pDelimiter, 1, pPosition - 1) + 1;
24 IF (iStartPos = 1) THEN
25 bContinue := FALSE;
26 END IF;
27 END IF;
28
29 IF (bContinue) THEN
30 -- Get the end position, and then the length of the token
31 IF (pPosition > 0) THEN
32 iEndPos := INSTR(pString, pDelimiter, 1, pPosition );
33 IF (iEndPos > 0) THEN
34 iLength := iEndPos - iStartPos;
35 ELSIF (iEndPos = 0) THEN
36 iLength := LENGTH(pString) - iStartPos + 1;
37 END IF;
38 ELSE
39 IF (pPosition = -1) THEN
40 iEndPos := LENGTH(pString);
41 ELSE
42 iEndPos := INSTR(pString, pDelimiter, -1, ABS(pPosition) - 1) - 1;
43 END IF;
44 iLength := iEndPos - iStartPos + 1;
45 END IF;
46
47 vToken := NVL(SUBSTR(pString, iStartPos, iLength), pNullVal);
48 ELSE
49 vToken := pNullVal;
50 END IF;
51
52 RETURN TO_CHAR(vToken);
53 END getToken;
54
55 BEGIN
56 vglNotesString := '2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection ';
57 vglNotesString := vglNotesString || 'department.#2/6/2012;23412;Compliance department have decided that customer ';
58 vglNotesString := vglNotesString || 'have sold products earlier on eBay. Customer''s status set to ON-HOLD. ';
59 vglNotesString := vglNotesString || 'Waiting for further instruction';
60
61
62 dbms_output.put_line('Positive tokenizing:');
63 dbms_output.put_line('Note 1:[' || getToken(vglNotesString, 1, '#', 'n/a') || ']');
64 dbms_output.put_line('Note 2:[' || getToken(vglNotesString, 2, '#', 'n/a') || ']');
65 dbms_output.put_line('Note 3:[' || getToken(vglNotesString, 3, '#', 'n/a') || ']');
66 dbms_output.put_line('Note 4:[' || getToken(vglNotesString, 4, '#', 'n/a') || ']');
67 dbms_output.put_line(' -- ');
68 dbms_output.put_line('Negative tokenizing:');
69 dbms_output.put_line('Note -1:[' || getToken(vglNotesString, -1, '#', 'n/a') || ']');
70 dbms_output.put_line('Note -2:[' || getToken(vglNotesString, -2, '#', 'n/a') || ']');
71 dbms_output.put_line('Note -3:[' || getToken(vglNotesString, -3, '#', 'n/a') || ']');
72 dbms_output.put_line('Note -4:[' || getToken(vglNotesString, -4, '#', 'n/a') || ']');
73 dbms_output.put_line(' -- ');
74 dbms_output.put_line('Positive tokenizing of a tokenized string:');
75 dbms_output.put_line('Note 1:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 1, ';', 'n/a') || ']');
76 dbms_output.put_line('Note 2:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 2, ';', 'n/a') || ']');
77 dbms_output.put_line('Note 3:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 3, ';', 'n/a') || ']');
78 END;
79 /
Positive tokenizing:
Note 1:[2/5/2012;23412;Customer called reg. invoice]
Note 2:[2/6/2012;23412;Contacted Lisa in the collection department.]
Note 3:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer's status set to ON-HOLD. Waiting for further instruction]
Note 4:[n/a]
--
Negative tokenizing:
Note -1:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer's status set to ON-HOLD. Waiting for further instruction]
Note -2:[2/6/2012;23412;Contacted Lisa in the collection department.]
Note -3:[2/5/2012;23412;Customer called reg. invoice]
Note -4:[n/a]
--
Positive tokenizing of a tokenized string:
Note 1:[2/5/2012]
Note 2:[23412]
Note 3:[Customer called reg. invoice]
PL/SQL procedure successfully completed
As you can see, I left the third parameter as a VARCHAR2, rather than using a CHAR. The reason for this is that sometimes our delimiter can actually consist of more than one character, like i.e. “<STRING_PART1>##<STRING_PART2>”
In addition I have made the function so that it can handle both positive and negative string positioning. This will be valuable if you for some reason wanted to tokenize / parse a string where the token positions are numbered from the end of the string, going towards the beginning.
Both the INSTR and the SUBSTR functions can handle both positive and negative positioning.
In Conclusion
As you can see, there are many ways to tokenize or parse a string in PL/SQL. In this little tutorial we have touched a few of them. No matter what method you choose, an advice would be to choose your delimiter wisely. Usually I use a character that most likely would not show up in a normal string, like: #, @, $, ^, etc.
One thing to consider as well is what you want your function to do if there is nothing returned. Would you i.e. just return the text “n/a“, or something?
If you have any comments or questions to this tutorial, please use the commenting field below, Or, you can also send a question through our contact page available from the top menu.
That’s all, folks. Have fun! Personally I love working with string manipulation.
PS: If you liked this tutorial, please share using buttons below.
~ TJ
OracleTuts Oracle Training, Tutorials, and Videos from Beginner to Advanced
