Systems Engineering and RDBMS

More on Regular Expression enhancements in Oracle 11g

Posted by decipherinfosys on March 12, 2008

In our previous blog posts, we had covered REGEXP_REPLACE and REGEXP_COUNT

Today we will talk about enhancements made to REGEXP_INSTR and REGEXP_SUBSTR functions which are available since Oracle 10g. A new parameter sub-expression is introduced in REGEXP_INSTR and REGEXP_SUBSTR functions. This parameter takes in an integer value 0 to 9. The integer value indicates which sub-expression should be searched for in the source string. So in simple terms, it allows us to search for the 1st, 2nd, 3rd…up to 9th subexpression. For the complete syntax of both the functions, please refer to the Oracle SQL Reference Manual. The basic difference is that REGEXP_INSTR returns the position of the subexpression while REGEXP_SUBSTR returns the actual subexpression.

What is the subexpression we are talking about? Subexpression is nothing but the portion of the string or a pattern surrounded with parenthesis. They are numbered in the order of the occurrence of their left parenthesis. Let us understand with couple of simple examples. Matching numbers below the pattern indicates the start and end of the subexpression.

‘( ( Dec ) ipher )’ contains two subexpressions
1 2 2 1
• Decipher – This is covered with parenthesis. This is first subexpression.
• Dec – This is nested subexpression which starts with 2nd left parenthesis. So this is 2nd subexpression.

Let us make this a little more complex.

‘( ( (Dec ) ipher ) ( ( In ) ( form )ation ) ) Systems’ contains 6 subexpressions.
1 23 3 2 4 5 5 6 6 4 1

With this basic understanding of subexpression , now let us start with the example of functions it self.

REGEXP_INSTR:

First we will check REGEXP_INSTR function. In first example, we would like to know the position of the 4th subexpression in the pattern. Connect to SQL*Plus and run following sql.

SQL> SELECT REGEXP_INSTR(‘Decipher Information Systems’,
2 ‘(((Dec)ipher) ((In)(form)ation)) Systems’,
3 1, 1, 0, ‘i’, 4) instr_position
4 FROM dual;

INSTR_POSITION
————–
10

Above SQL returned the position of the first character of the subexpression which is 10. This output is governed by the value of the 5th parameter returning_option. Following is the brief explanation of the parameters of the function REGEXP_INSTR in the order.

• Source string:
• Pattern: Is any string, pattern or text which will be found in the source string. Data type of pattern will be converted to data type of source string, if it is different than the source string.
• Position: It is a non zero positive value from which search should begin.
• Occurrence: Indicates which occurrence of the pattern
• Return_option: As mentioned earlier, it governs the outcome of the function. If value is 0, it returns the position of the first character of the occurrence as shown in example above. This is default behavior. If value is set to 1, it returns the position of the character after the occurrence of the subexpression.
• Matching parameter: which controls the behavior of the search. Values are same as value of the corresponding parameter of REGEXP_COUNT function.
• Subexpression: Any integer value from 0 to 9 which indicates subexpression occurrence.

Here is the another example of the REGEXP_INSTR function where return_option parameter value is set to 1.

SQL> SELECT REGEXP_INSTR(‘Decipher Information Systems’,
2 ‘(((Dec)ipher) ((In)(form)ation)) Systems’,
3 1, 1, 1, ‘i’, 4) instr_position
4 FROM dual;

INSTR_POSITION
————–
21

REGEXP_SUBSTR:

This function actually returns the subexpression. Here is the example.

SQL> SELECT REGEXP_SUBSTR(‘Decipher Information Systems’,
2 ‘(((Dec)ipher) ((In)(form)ation)) Systems’,
3 1, 1,’i’, 4) Text
4 FROM dual;

TEXT
———–
Information

Above SQL gives us the value of the 4th subexpression in the pattern. Let us check another example to find out 2nd occurrence of the 6th subexpression. Since there is no 2nd occurrence of the matching pattern, it returns null value. Here is the SQL.

SQL> SELECT REGEXP_SUBSTR(‘Decipher Information Systems’,
2 ‘(((Dec)ipher) ((In)(form)ation)) Systems’,
3 1, 2,’i’, 6) Text
4 FROM dual;

T

Now instead of 2nd, we will try to find 1st occurrence of the same subexpression.

SQL> SELECT REGEXP_SUBSTR(‘Decipher Information Systems’,
2 ‘(((Dec)ipher) ((In)(form)ation)) Systems’,
3 1, 1,’i’, 6) Text
4 FROM dual;

TEXT
———–
form

REGEXP_SUBSTR also has the same parameters as REGEXP_INSTR except return_optn parameter.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: