Get the latest!
You are here: Home / Tutorials / How To Tokenize Or Parse A String In PLSQL
How To Tokenize Or Parse A String In PLSQL

How To Tokenize Or Parse A String In PLSQL

If you have worked with SQL and PL/SQL for a while, you have probably been in a situation where you have a string that is delimited with a character, and you want the chunks of the string by themselves… In this tutorial we are going to look at a few ways to do this.


Introduction

In my previous post ( Quick Tip: Slicing A Long PLSQL String Into Smaller Pieces ), I kind of touched something similar, but there was no logic in this code that handled any delimiters, etc. We just brutally cut the string into pieces.

So, in this tutorial we are actually going to show how to pick out the pieces more carefully. Throughout the examples we are going to use a collection, SQL, the REGEXPR_SUBSTR function, SUBSTR, INSTR, and a PL/SQL Collection.

Ok, let’s get down to some business…

Example 1: Using SQL and REGEXP_SUBSTR

In this example we are only going to look at how to tokenize ONE single string, since that is the purpose of this tutorial. If you would like to see an example using REGEXPR_SUBSTR handling multiple records, please take a look at Alex Nuijten’s post found here.

For this example, here is the code:

SQL> set serveroutput on
SQL>
SQL> DECLARE
  2  	CURSOR cur IS
  3          WITH qry AS (
  4                          SELECT	'Paris#London#Rome#Oslo#Amsterdam#New York' city_string
  5                          FROM	dual
  6          )
  7          SELECT	regexp_substr (city_string, '[^#]+', 1, ROWNUM) city
  8          FROM	qry
  9          CONNECT BY LEVEL <= LENGTH(regexp_replace (city_string, '[^#]+')) + 1
 10          ;
 11
 12  BEGIN
 13  	FOR rec IN cur LOOP
 14          dbms_output.put_line('City:' || rec.city);
 15      END LOOP;
 16  END;
 17  /

City:Paris
City:London
City:Rome
City:Oslo
City:Amsterdam
City:New York

PL/SQL procedure successfully completed

Ok, so then we’re done with the SQL part. Let’s jump over to PL/SQL code to do some cool stuff.

Example 2: Using a PL/SQL Collection

In this example we are going to tokenize our string and put it into a PL/SQL collection for later use.

SQL> DECLARE
  2
  3  	TYPE typTokenTab IS TABLE OF VARCHAR(20) INDEX BY BINARY_INTEGER;
  4  	vglCityString	VARCHAR2(200);
  5
  6  	tglTokenTab		typTokenTab;
  7
  8  	--	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  9  	--	Function: creTokenList
 10      --
 11      --	This function takes a string with "tokens" delimited by pDelimiter
 12      --	and put each "token" into a separate record in a PL/SQL collection. The
 13      --	PL/SQL collection is returned back to the caller of the function.
 14  	--	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 15  	FUNCTION creTokenList(pLine IN VARCHAR2, pDelimiter IN VARCHAR2) RETURN typTokenTab IS
 16  		sLine		VARCHAR2(2000);
 17      	nPos		INTEGER;
 18      	nPosOld		INTEGER;
 19          nIndex		INTEGER;
 20          nLength		INTEGER;
 21          nCnt		INTEGER;
 22          sToken		VARCHAR2(200);
 23          tTokenTab	typTokenTab;
 24      BEGIN
 25      	sLine := pLine;
 26  		IF (SUBSTR(sLine, LENGTH(sLine), 1) <> '|') THEN
 27  			sLine := sLine || '|';
 28  		END IF;
 29
 30      	nPos := 0;
 31          sToken := '';
 32          nLength := LENGTH(sLine);
 33          nCnt := 0;
 34
 35          FOR nIndex IN 1..nLength LOOP
 36          	IF ((SUBSTR(sLine, nIndex, 1) = pDelimiter) OR (nIndex = nLength)) THEN
 37  	        	nPosOld := nPos;
 38              	nPos := nIndex;
 39  	        	nCnt := nCnt + 1;
 40  	            sToken := SUBSTR(sLine, nPosOld + 1, nPos - nPosOld - 1);
 41
 42  				tTokenTab(nCnt) := sToken;
 43              END IF;
 44
 45          END LOOP;
 46
 47          RETURN tTokenTab;
 48      END creTokenList;
 49
 50  BEGIN
 51  	vglCityString := 'Paris#London#Rome#Oslo#Amsterdam#New York';
 52
 53  	tglTokenTab := creTokenList( vglCityString, '#');
 54
 55      FOR indx IN tglTokenTab.FIRST..tglTokenTab.LAST LOOP
 56          dbms_output.put_line('City:' || tglTokenTab(indx));
 57      END LOOP;
 58
 59  	tglTokenTab.DELETE;
 60  END;
 61  /

City:Paris
City:London
City:Rome
City:Oslo
City:Amsterdam
City:New York

PL/SQL procedure successfully completed

Remember that  when we are working with PL/SQL Collections, we need to clean up our mess. What I mean with this is that when we are playing with memory it is like cluttering up a child’s room with toys. We need to put the toys back into the box and into the closet again so that we can use the floor space for something else later. Hmm…to be technically correct we would need to put the toys in the box and give the box to the Salvation Army…but I don’t think we would have very happy children then. Think you get the point. :-)

Ok, let’s move forward.

Example # 3: Using INSTR and SUBSTR to get a token at a certain position

So, let us say that you have a customer notes table with a field that always follows the same structure for how the customer service reps enter the notes. Let’s say it has the following format:

<DATE>;<ORDER_ID>;<NOTE>#

That would i.e. give us something similar to this in our table:

2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection department.#2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction

If we break down the notes field we will see that there are actually three different parts in the note:

  • 2/5/2012;23412;Customer called reg. invoice
  • 2/6/2012;23412;Contacted Lisa in the collection department.
  • 2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction

Our task is to create a function that can get any of the three parts, AND get the individual elements of a single note.

Here is a sample of how this can be done:

SQL> set serveroutput on
SQL>
SQL> DECLARE
  2  	vglNotesString	VARCHAR2(2000);
  3
  4  	FUNCTION getToken(
  5          				pString IN VARCHAR2
  6                          ,pPosition IN INTEGER
  7                          ,pDelimiter IN VARCHAR2 DEFAULT ';'
  8                          ,pNullVal IN VARCHAR2 DEFAULT 'n/a'
  9      ) RETURN VARCHAR2 IS
 10  		sLine		VARCHAR2(2000);
 11      	iStartPos	INTEGER;
 12      	iEndPos		INTEGER;
 13          iLength		INTEGER;
 14          vToken		VARCHAR2(200);
 15  		bContinue	BOOLEAN := TRUE;
 16      BEGIN
 17          -- Get start position
 18          IF ( pPosition = 1) THEN
 19              iStartPos := 1;
 20          ELSIF ( pPosition < 0) THEN
 21              iStartPos := INSTR(pString, pDelimiter, -1, ABS(pPosition)) + 1;
 22  		ELSE
 23              iStartPos := INSTR(pString, pDelimiter, 1, pPosition - 1) + 1;
 24  			IF (iStartPos = 1) THEN
 25                  bContinue := FALSE;
 26              END IF;
 27          END IF;
 28
 29  		IF (bContinue) THEN
 30  	        -- Get the end position, and then the length of the token
 31          	IF (pPosition > 0) THEN
 32                  iEndPos := INSTR(pString, pDelimiter, 1, pPosition );
 33                  IF (iEndPos > 0) THEN
 34                      iLength := iEndPos - iStartPos;
 35                  ELSIF (iEndPos = 0) THEN
 36                      iLength := LENGTH(pString) - iStartPos + 1;
 37                  END IF;
 38  			ELSE
 39                  IF (pPosition = -1) THEN
 40  	                iEndPos := LENGTH(pString);
 41                  ELSE
 42  	                iEndPos := INSTR(pString, pDelimiter, -1, ABS(pPosition) - 1) - 1;
 43  				END IF;
 44  				iLength := iEndPos - iStartPos + 1;
 45              END IF;
 46
 47              vToken := NVL(SUBSTR(pString, iStartPos, iLength), pNullVal);
 48  		ELSE
 49              vToken := pNullVal;
 50          END IF;
 51
 52          RETURN TO_CHAR(vToken);
 53      END getToken;
 54
 55  BEGIN
 56  	vglNotesString := '2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection ';
 57      vglNotesString := vglNotesString || 'department.#2/6/2012;23412;Compliance department have decided that customer ';
 58      vglNotesString := vglNotesString || 'have sold products earlier on eBay. Customer''s status set to ON-HOLD. ';
 59      vglNotesString := vglNotesString || 'Waiting for further instruction';
 60
 61
 62  	dbms_output.put_line('Positive tokenizing:');
 63  	dbms_output.put_line('Note 1:[' || getToken(vglNotesString, 1, '#', 'n/a') || ']');
 64  	dbms_output.put_line('Note 2:[' || getToken(vglNotesString, 2, '#', 'n/a') || ']');
 65  	dbms_output.put_line('Note 3:[' || getToken(vglNotesString, 3, '#', 'n/a') || ']');
 66  	dbms_output.put_line('Note 4:[' || getToken(vglNotesString, 4, '#', 'n/a') || ']');
 67  	dbms_output.put_line(' -- ');
 68  	dbms_output.put_line('Negative tokenizing:');
 69  	dbms_output.put_line('Note -1:[' || getToken(vglNotesString, -1, '#', 'n/a') || ']');
 70  	dbms_output.put_line('Note -2:[' || getToken(vglNotesString, -2, '#', 'n/a') || ']');
 71  	dbms_output.put_line('Note -3:[' || getToken(vglNotesString, -3, '#', 'n/a') || ']');
 72  	dbms_output.put_line('Note -4:[' || getToken(vglNotesString, -4, '#', 'n/a') || ']');
 73  	dbms_output.put_line(' -- ');
 74  	dbms_output.put_line('Positive tokenizing of a tokenized string:');
 75  	dbms_output.put_line('Note 1:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 1, ';', 'n/a') || ']');
 76  	dbms_output.put_line('Note 2:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 2, ';', 'n/a') || ']');
 77  	dbms_output.put_line('Note 3:[' || getToken(getToken(vglNotesString, 1, '#', 'n/a'), 3, ';', 'n/a') || ']');
 78  END;
 79  /

Positive tokenizing:
Note 1:[2/5/2012;23412;Customer called reg. invoice]
Note 2:[2/6/2012;23412;Contacted Lisa in the collection department.]
Note 3:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer's status set to ON-HOLD. Waiting for further instruction]
Note 4:[n/a]
 --
Negative tokenizing:
Note -1:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer's status set to ON-HOLD. Waiting for further instruction]
Note -2:[2/6/2012;23412;Contacted Lisa in the collection department.]
Note -3:[2/5/2012;23412;Customer called reg. invoice]
Note -4:[n/a]
 --
Positive tokenizing of a tokenized string:
Note 1:[2/5/2012]
Note 2:[23412]
Note 3:[Customer called reg. invoice]

PL/SQL procedure successfully completed

 

As you can see, I left the third parameter as a VARCHAR2, rather than using a CHAR. The reason for this is that sometimes our delimiter can actually consist of more than one character, like i.e. “<STRING_PART1>##<STRING_PART2>

In addition I have made the function so that it can handle both positive and negative string positioning. This will be valuable if you for some reason wanted to tokenize / parse a string where the token positions are numbered from the end of the string, going towards the beginning.

Both the INSTR and the SUBSTR functions can handle both positive and negative positioning.

In Conclusion

As you can see, there are many ways to tokenize or parse a string in PL/SQL. In this little tutorial we have touched a few of them. No matter what method you choose, an advice would be to choose your delimiter wisely. Usually I use a character that most likely would not show up in a normal string, like: #, @, $, ^, etc.

One thing to consider as well is what you want your function to do if there is nothing returned. Would you i.e. just return the text “n/a“, or something?

If you have any comments or questions to this tutorial, please use the commenting field below, Or, you can also send a question through our contact page available from the top menu.

That’s all, folks. Have fun! Personally I love working with string manipulation.

PS: If you liked this tutorial, please share using buttons below. :-)

~ TJ

About TJ Abrahamsen

TJ Abrahamsen is the founder and author of OracleTuts. He has worked with Oracle since 1997. His expertise is with Oracle SQL and PL/SQL. He loves teaching others, and like to share his thoughts, ideas, and experience on his blog.
McAfee Total Protetion 50% Off
Scroll To Top