Adsense Ad

Saturday, 13 May 2017

Oracle PLSQL: REGEXP_INSTR Function

REGEXP_INSTR Function

This Oracle tutorial explains how to use the Oracle/PLSQL REGEXP_INSTR function with syntax and examples.

Description

The Oracle/PLSQL REGEXP_INSTR function is an extension of the INSTR function. It returns the location of a regular expression pattern in a string. This function, introduced in Oracle 10g, will allow you to find a substring in a string using regular expression pattern matching.

Syntax

The syntax for the REGEXP_INSTR function in Oracle is:
REGEXP_INSTR( string, pattern [, start_position [, nth_appearance [, return_option [, match_parameter [, sub_expression ] ] ] ] ] )

Parameters or Arguments

string
The string to search. string can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern
The regular expression matching information. It can be a combination of the following:
ValueDescription
^Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression.
$Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression.
*Matches zero or more occurrences.
+Matches one or more occurrences.
?Matches zero or one occurrence.
.Matches any character except NULL.
|Used like an "OR" to specify more than one alternative.
[ ]Used to specify a matching list where you are trying to match any one of the characters in the list.
[^ ]Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
( )Used to group expressions as a subexpression.
{m}Matches m times.
{m,}Matches at least m times.
{m,n}Matches at least m times, but no more than n times.
\nn is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n.
[..]Matches one collation element that can be more than one character.
[::]Matches character classes.
[==]Matches equivalence classes.
\dMatches a digit character.
\DMatches a nondigit character.
\wMatches a word character.
\WMatches a nonword character.
\sMatches a whitespace character.
\Smatches a non-whitespace character.
\AMatches the beginning of a string or matches at the end of a string before a newline character.
\ZMatches at the end of a string.
*?Matches the preceding pattern zero or more occurrences.
+?Matches the preceding pattern one or more occurrences.
??Matches the preceding pattern zero or one occurrence.
{n}?Matches the preceding pattern n times.
{n,}?Matches the preceding pattern at least n times.
{n,m}?Matches the preceding pattern at least n times, but not more than m times.
start_position
Optional. It is the position in string where the search will start. If omitted, it defaults to 1 which is the first position in the string.
nth_appearance
Optional. It is the nth appearance of pattern in string. If omitted, it defaults to 1 which is the first appearance of pattern in string.
return_option
Optional. If a return_option of 0 is provided, the position of the first character of the occurrence of pattern is returned. If a return_option of 1 is provided, the position of the character after the occurrence of pattern is returned. If omitted, it defaults to 0.
match_parameter
Optional. It allows you to modify the matching behavior for the REGEXP_INSTR function. It can be a combination of the following:
ValueDescription
'c'Perform case-sensitive matching.
'i'Perform case-insensitive matching.
'n'Allows the period character (.) to match the newline character. By default, the period is a wildcard.
'm'expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
'x'Whitespace characters are ignored. By default, whitespace characters are matched like any other character.
subexpression
Optional. This is used when pattern has subexpressions and you wish to indicate which subexpression in pattern is the target. It is an integervalue from 0 to 9 indicating the subexpression to match on in pattern.

Note

  • If there are conflicting values provided for match_parameter, the REGEXP_INSTR function will use the last value.
  • If you omit the match_behavior parameter, the REGEXP_INSTR function will use the NLS_SORT parameter to determine if it should use a case-sensitive search, it will assume that string is a single line, and assume the period character to match any character (not the newline character).
  • If the REGEXP_INSTR function does not find any occurrence of pattern, it will return 0.

Applies To

The REGEXP_INSTR function can be used in the following versions of Oracle/PLSQL:
  • Oracle 12c, Oracle 11g, Oracle 10g

Example - Match on Single Character

Let's start by looking at the simplest case. Let's find the position of the first 't' character in a string.
For example:
SELECT REGEXP_INSTR ('TechOnTheNet is a great resource', 't')
FROM dual;

Result: 12
This example will return 12 because it is performing a case-sensitive search of 't'. Therefore, it skips the 'T' characters and finds the first 't' in the 12th position.
If we wanted to include both 't' and 'T' in our results and perform a case-insensitive search, we could modify our query as follows:
SELECT REGEXP_INSTR ('TechOnTheNet is a great resource', 't', 1, 1, 0, 'i')
FROM dual;

Result: 1
Now because we have provide a start_position of 1, an nth_appearance of 1, a return_option of 0, and a match_parameter of 'i', the query will return 1 as the result. This time, the function will search for both 't' and 'T' values and return the first occurrence.
If we wanted to find the first occurrence of the character 't' in a column, we could try something like this (case-insensitive search):
SELECT REGEXP_INSTR (last_name, 't', 1, 1, 0, 'i') AS First_Occurrence
FROM contacts;
This would return the first occurrence of 't' or 'T' values in the last_name field from the contacts table.

Example - Match on Multiple Characters

Let's look next at how we would use the REGEXP_INSTR function to match on a multi-character pattern.
For example:
SELECT REGEXP_INSTR ('The example shows how to use the REGEXP_INSTR function', 'ow', 1, 1, 0, 'i')
FROM dual;

Result: 15
This example will return the first occurrence of 'ow' in the string. It will match on the 'ow' in the word 'shows'.
We could change the starting position of the search so that we perform the search starting from the middle of the string.
For example:
SELECT REGEXP_INSTR ('The example shows how to use the REGEXP_INSTR function', 'ow', 16, 1, 0, 'i')
FROM dual;

Result: 20
This example will start the search for the pattern of 'ow' at position 16 in the string. In this case, it will skip over the first 15 characters in the string before searching for the pattern.
Now, let's look how we would use the REGEXP_INSTR function with a table column and search for multiple characters.
For example:
SELECT REGEXP_INSTR (other_comments, 'the', 1, 1, 0, 'i')
FROM contacts;
In this example, we are going to search for the pattern in the other_comments field in the contacts table.

Example - Match on more than one alternative

The next example that we will look at involves using the | pattern. The | pattern is used like an "OR" to specify more than one alternative.
For example:
SELECT REGEXP_INSTR ('Anderson', 'a|e|i|o|u')
FROM dual;

Result: 4
This example will return 4 because it is searching for the first vowel (a, e, i, o, or u) in the string. Since we did not specify a match_parameter value, the REGEXP_INSTR function will perform a case-sensitive search which means that the 'A' in 'Anderson' will not be matched.
We could modify our query as follows to perform a case-insensitive search as follows:
SELECT REGEXP_INSTR ('Anderson', 'a|e|i|o|u', 1, 1, 0, 'i')
FROM dual;

Result: 1
Now because we have provide a match_parameter of 'i', the query will return 1 as the result. This time, the 'A' in 'Anderson' will be found as a match.
Now, let's quickly show how you would use this function with a column.
So let's say we have a contact table with the following data:
contact_idlast_name
1000Anderson
2000Smith
3000Johnson
Now, let's run the following query:
SELECT contact_id, last_name, REGEXP_INSTR (last_name, 'a|e|i|o|u', 1, 1, 0, 'i') AS first_occurrence
FROM contacts;
These are the results that would be returned by the query:
contact_idlast_namefirst_occurrence
1000Anderson1
2000Smith3
3000Johnson2

Example - Match on nth_occurrence

The next example that we will look at involves the nth_occurrence parameter. The nth_occurrence parameter allows you to select which occurrence of the pattern you wish to return the position of.

First Occurrence

Let's look at how to find the first occurrence of a pattern in a string.
For example:
SELECT REGEXP_INSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 1, 0, 'i')
FROM dual;

Result: 2
This example will return 2 because it is searching for the first occurrence of a vowel (a, e, i, o, or u) in the string.

Second Occurrence

Next, we will search for the second occurrence of a pattern in a string.
For example:
SELECT REGEXP_INSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 2, 0, 'i')
FROM dual;

Result: 5
This example will return 5 because it is searching for the second occurrence of a vowel (a, e, i, o, or u) in the string.

Third Occurrence

For example:
SELECT REGEXP_INSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 3, 0, 'i')
FROM dual;

Result: 9
This example will return 9 because it is searching for the third occurrence of a vowel (a, e, i, o, or u) in the string.

Example - return_option parameter

Finally, let's look at how the return_option parameter affects our results.
For example:
SELECT REGEXP_INSTR ('TechOnTheNet', 'The', 1, 1, 0, 'i')
FROM dual;

Result: 7
In this basic example, we are searching for a pattern in a string and the search is case-insensitive. We have specified the return_option parameter as 0 which means the position of the first character of the pattern will be returned.
Now, let's change the return_option parameter to 1 and see what happens.
For example:
SELECT REGEXP_INSTR ('TechOnTheNet', 'The', 1, 1, 1, 'i')
FROM dual;

Result: 10
return_option parameter of 1 tells the REGEXP_INSTR function to return the position of the character following the matched pattern. In this example, the function will return 10.

No comments: