Terminology
The Caret (^) character is also referred to by the following terms:
Usage
It has two uses in regular expressions:
[^
) it acts to negate the set of allowed characters (i.e. [123]
means the character 1, 2, or 3 is allowed, whilst the statement [^123]
means any character other than 1, 2, or 3 is allowed.Character Escaping
To express a caret without special meaning, it should be escaped by preceding it with a backslash; i.e. \^
.
(?m)
modifier is turned off, ^
matches only the input string's beginning:For the regex
^He
The following input strings match:
Hedgehog\nFirst line\nLast line
Help me, please
He
And the following input strings do not match:
First line\nHedgehog\nLast line
IHedgehog
Hedgehog
(due to white-spaces
)(?m)
modifier is turned on, ^
matches every line's beginning:^He
The above would match any input string that contains a line beginning with He
.
Considering \n
as the new line character, the following lines match:
Hello
First line\nHedgehog\nLast line
(second line only)My\nText\nIs\nHere
(last line only)And the following input strings do not match:
Camden Hells Brewery
Helmet
(due to white-spaces
)^
Another typical use case for caret is matching empty lines (or an empty string if the multi-line modifier is turned off).
In order to match an empty line (multi-line on), a caret is used next to a $
which is another anchor character representing the position at the end of line (Anchor Characters: Dollar ($) ). Therefore, the following regular expression will match an empty line:
^$
If you need to use the ^
character in a character class (Character classes ), either put it somewhere other than the beginning of the class:
[12^3]
Or escape the ^
using a backslash \
:
[\^123]
If you want to match the caret character itself outside a character class, you need to escape it:
\^
This prevents the ^
being interpreted as the anchor character representing the beginning of the string/line.
While many people think that ^
means the start of a string, it actually means start of a line. For an actual start of string anchor use, \A
.
The string hello\nworld
(or more clearly)
hello
world
Would be matched by the regular expressions ^h
, ^w
and \Ah
but not by \Aw
By default, the caret ^
metacharacter matches the position before the first
character in the string.
Given the string "charsequence" applied
against the following patterns: /^char/
& /^sequence/
, the engine will try to match as follows:
/^char/
charsequencec
harsequencech
arsequencecha
rsequencechar
sequenceMatch Found
/^sequence/
charsequence
charsequenceMatch not Found
The same behaviour will be applied even if the string contains line terminators, such as \r?\n
. Only the position at the start of the string will be matched.
For example:
/^/g
┊char\r\n
\r\n
sequence
However, if you need to match after every line terminator, you will have to set the multiline mode (//m
, (?m)
) within your pattern. By doing so, the caret ^
will match "the beginning of each line", which corresponds to the position at the beginning of the string and the positions immediately after1 the line terminators.
1 In some flavors (Java, PCRE, ...), ^
will not match after the line terminator, if the line terminator is the last in the string.
For example:
/^/gm
┊char\r\n
┊\r\n
┊sequence
Some of the regular expression engines that support Multiline modifier:
Pattern pattern = Pattern.compile("(?m)^abc");
Pattern pattern = Pattern.compile("^abc", Pattern.MULTILINE);
var abcRegex = new Regex("(?m)^abc");
var abdRegex = new Regex("^abc", RegexOptions.Multiline)
/(?m)^abc/
/^abc/m
Python 2 & 3 (built-in re
module)
abc_regex = re.compile("(?m)^abc");
abc_regex = re.compile("^abc", re.MULTILINE);