Recursion is mostly available in Perl-compatible flavors, such as:
The construct (?R)
is equivalent to (?0)
(or \g<0>
) - it lets you recurse the whole pattern:
<(?>[^<>]+|(?R))+>
This will match properly balanced angle brackets with any text in-between the brackets, like <a<b>c<d>e>
.
You can recurse into a subpattern using the following constructs (depending on the flavor), assuming n
is a capturing group number, and name
the name of a capturing group.
(?n)
\g<n>
\g'0'
(?&name)
\g<name>
\g'name'
(?P>name)
The following pattern:
\[(?<angle><(?&angle)*+>)*\]
Will match text such as: [<<><>><>]
- well balanced angle brackets within square brackets. Recursion is often used for balanced constructs matching.
The (?(DEFINE)
...)
construct lets you define subpatterns you may reference later through recursion. When encountered in the pattern it will not be matched against.
This group should contain named subpattern definitions, which will be accessible only through recursion. You can define grammars this way:
(?x) # ignore pattern whitespace
(?(DEFINE)
(?<string> ".*?" )
(?<number> \d+ )
(?<value>
\s* (?:
(?&string)
| (?&number)
| (?&list)
) \s*
)
(?<list> \[ (?&value) (?: , (?&value) )* \] )
)
^(?&value)$
This pattern will validate text like the following:
[42, "abc", ["foo", "bar"], 10]
Note how a list can contain one or more values, and a value can itself be a list.
Subpatterns can be referenced with their relative group number:
(?-1)
will recurse into the previous group(?+1)
will recurse into the next groupAlso usable with the \g<N>
syntax.
In PCRE, matched groups used for backreferences before a recursion are kept in the recursion. But after the recursion they all reset to what they were before entering it. In other words, matched groups in the recursion are all forgotten.
For example:
(?J)(?(DEFINE)(\g{a}(?<a>b)\g{a}))(?<a>a)\g{a}(?1)\g{a}
matches
aaabba
In PCRE, it doesn't trackback after the first match for a recursion is found. So
(?(DEFINE)(aaa|aa|a))(?1)ab
doesn't match
aab
because after it matched aa
in the recursion, it never try again to match only a
.