Terminology
The Caret (^) character is also referred to by the following terms:
Usage
It has two uses in regular expressions:
[^) it acts to negate the set of allowed characters (i.e. [123] means the character 1, 2, or 3 is allowed, whilst the statement [^123] means any character other than 1, 2, or 3 is allowed.Character Escaping
To express a caret without special meaning, it should be escaped by preceding it with a backslash; i.e. \^.
(?m) modifier is turned off, ^ matches only the input string's beginning:For the regex
^He
The following input strings match:
Hedgehog\nFirst line\nLast lineHelp me, pleaseHeAnd the following input strings do not match:
First line\nHedgehog\nLast lineIHedgehog Hedgehog (due to white-spaces )(?m) modifier is turned on, ^ matches every line's beginning:^He
The above would match any input string that contains a line beginning with He.
Considering \n as the new line character, the following lines match:
HelloFirst line\nHedgehog\nLast line (second line only)My\nText\nIs\nHere (last line only)And the following input strings do not match:
Camden Hells Brewery Helmet (due to white-spaces )^Another typical use case for caret is matching empty lines (or an empty string if the multi-line modifier is turned off).
In order to match an empty line (multi-line on), a caret is used next to a $ which is another anchor character representing the position at the end of line (Anchor Characters: Dollar ($) ). Therefore, the following regular expression will match an empty line:
^$
If you need to use the ^ character in a character class (Character classes ), either put it somewhere other than the beginning of the class:
[12^3]
Or escape the ^ using a backslash \:
[\^123]
If you want to match the caret character itself outside a character class, you need to escape it:
\^
This prevents the ^ being interpreted as the anchor character representing the beginning of the string/line.
While many people think that ^ means the start of a string, it actually means start of a line. For an actual start of string anchor use, \A.
The string hello\nworld (or more clearly)
hello
world
Would be matched by the regular expressions ^h, ^w and \Ah but not by \Aw
By default, the caret ^ metacharacter matches the position before the first
character in the string.
Given the string "charsequence" applied
against the following patterns: /^char/ & /^sequence/, the engine will try to match as follows:
/^char/
charsequencecharsequencecharsequencecharsequencecharsequenceMatch Found
/^sequence/
charsequencecharsequenceMatch not Found
The same behaviour will be applied even if the string contains line terminators, such as \r?\n. Only the position at the start of the string will be matched.
For example:
/^/g
┊char\r\n
\r\n
sequence
However, if you need to match after every line terminator, you will have to set the multiline mode (//m, (?m)) within your pattern. By doing so, the caret ^ will match "the beginning of each line", which corresponds to the position at the beginning of the string and the positions immediately after1 the line terminators.
1 In some flavors (Java, PCRE, ...), ^ will not match after the line terminator, if the line terminator is the last in the string.
For example:
/^/gm
┊char\r\n
┊\r\n
┊sequence
Some of the regular expression engines that support Multiline modifier:
Pattern pattern = Pattern.compile("(?m)^abc");
Pattern pattern = Pattern.compile("^abc", Pattern.MULTILINE);
var abcRegex = new Regex("(?m)^abc");
var abdRegex = new Regex("^abc", RegexOptions.Multiline)
/(?m)^abc/
/^abc/m
Python 2 & 3 (built-in re module)
abc_regex = re.compile("(?m)^abc");
abc_regex = re.compile("^abc", re.MULTILINE);