Permalink
Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upre2/doc/syntax.txt
Go to fileChange-Id: Iff57514e09d6e4e141384dd0cf138314eb1435f1 Reviewed-on: https://code-review.googlesource.com/c/re2/+/53330 Reviewed-by: Paul Wankadia <junyer@google.com>
RE2 regular expression syntax reference | |
------------------------------------- | |
Single characters: | |
. any character, possibly including newline (s=true) | |
[xyz] character class | |
[^xyz] negated character class | |
\d Perl character class | |
\D negated Perl character class | |
[[:alpha:]] ASCII character class | |
[[:^alpha:]] negated ASCII character class | |
\pN Unicode character class (one-letter name) | |
\p{Greek} Unicode character class | |
\PN negated Unicode character class (one-letter name) | |
\P{Greek} negated Unicode character class | |
Composites: | |
xy «x» followed by «y» | |
x|y «x» or «y» (prefer «x») | |
Repetitions: | |
x* zero or more «x», prefer more | |
x+ one or more «x», prefer more | |
x? zero or one «x», prefer one | |
x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more | |
x{n,} «n» or more «x», prefer more | |
x{n} exactly «n» «x» | |
x*? zero or more «x», prefer fewer | |
x+? one or more «x», prefer fewer | |
x?? zero or one «x», prefer zero | |
x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer | |
x{n,}? «n» or more «x», prefer fewer | |
x{n}? exactly «n» «x» | |
x{} (== x*) NOT SUPPORTED vim | |
x{-} (== x*?) NOT SUPPORTED vim | |
x{-n} (== x{n}?) NOT SUPPORTED vim | |
x= (== x?) NOT SUPPORTED vim | |
Implementation restriction: The counting forms «x{n,m}», «x{n,}», and «x{n}» | |
reject forms that create a minimum or maximum repetition count above 1000. | |
Unlimited repetitions are not subject to this restriction. | |
Possessive repetitions: | |
x*+ zero or more «x», possessive NOT SUPPORTED | |
x++ one or more «x», possessive NOT SUPPORTED | |
x?+ zero or one «x», possessive NOT SUPPORTED | |
x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED | |
x{n,}+ «n» or more «x», possessive NOT SUPPORTED | |
x{n}+ exactly «n» «x», possessive NOT SUPPORTED | |
Grouping: | |
(re) numbered capturing group (submatch) | |
(?P<name>re) named & numbered capturing group (submatch) | |
(?<name>re) named & numbered capturing group (submatch) NOT SUPPORTED | |
(?'name're) named & numbered capturing group (submatch) NOT SUPPORTED | |
(?:re) non-capturing group | |
(?flags) set flags within current group; non-capturing | |
(?flags:re) set flags during re; non-capturing | |
(?#text) comment NOT SUPPORTED | |
(?|x|y|z) branch numbering reset NOT SUPPORTED | |
(?>re) possessive match of «re» NOT SUPPORTED | |
re@> possessive match of «re» NOT SUPPORTED vim | |
%(re) non-capturing group NOT SUPPORTED vim | |
Flags: | |
i case-insensitive (default false) | |
m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false) | |
s let «.» match «\n» (default false) | |
U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false) | |
Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»). | |
Empty strings: | |
^ at beginning of text or line («m»=true) | |
$ at end of text (like «\z» not «\Z») or line («m»=true) | |
\A at beginning of text | |
\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other) | |
\B not at ASCII word boundary | |
\G at beginning of subtext being searched NOT SUPPORTED pcre | |
\G at end of last match NOT SUPPORTED perl | |
\Z at end of text, or before newline at end of text NOT SUPPORTED | |
\z at end of text | |
(?=re) before text matching «re» NOT SUPPORTED | |
(?!re) before text not matching «re» NOT SUPPORTED | |
(?<=re) after text matching «re» NOT SUPPORTED | |
(?<!re) after text not matching «re» NOT SUPPORTED | |
re& before text matching «re» NOT SUPPORTED vim | |
re@= before text matching «re» NOT SUPPORTED vim | |
re@! before text not matching «re» NOT SUPPORTED vim | |
re@<= after text matching «re» NOT SUPPORTED vim | |
re@<! after text not matching «re» NOT SUPPORTED vim | |
\zs sets start of match (= \K) NOT SUPPORTED vim | |
\ze sets end of match NOT SUPPORTED vim | |
\%^ beginning of file NOT SUPPORTED vim | |
\%$ end of file NOT SUPPORTED vim | |
\%V on screen NOT SUPPORTED vim | |
\%# cursor position NOT SUPPORTED vim | |
\%'m mark «m» position NOT SUPPORTED vim | |
\%23l in line 23 NOT SUPPORTED vim | |
\%23c in column 23 NOT SUPPORTED vim | |
\%23v in virtual column 23 NOT SUPPORTED vim | |
Escape sequences: | |
\a bell (== \007) | |
\f form feed (== \014) | |
\t horizontal tab (== \011) | |
\n newline (== \012) | |
\r carriage return (== \015) | |
\v vertical tab character (== \013) | |
\* literal «*», for any punctuation character «*» | |
\123 octal character code (up to three digits) | |
\x7F hex character code (exactly two digits) | |
\x{10FFFF} hex character code | |
\C match a single byte even in UTF-8 mode | |
\Q...\E literal text «...» even if «...» has punctuation | |
\1 backreference NOT SUPPORTED | |
\b backspace NOT SUPPORTED (use «\010») | |
\cK control char ^K NOT SUPPORTED (use «\001» etc) | |
\e escape NOT SUPPORTED (use «\033») | |
\g1 backreference NOT SUPPORTED | |
\g{1} backreference NOT SUPPORTED | |
\g{+1} backreference NOT SUPPORTED | |
\g{-1} backreference NOT SUPPORTED | |
\g{name} named backreference NOT SUPPORTED | |
\g<name> subroutine call NOT SUPPORTED | |
\g'name' subroutine call NOT SUPPORTED | |
\k<name> named backreference NOT SUPPORTED | |
\k'name' named backreference NOT SUPPORTED | |
\lX lowercase «X» NOT SUPPORTED | |
\ux uppercase «x» NOT SUPPORTED | |
\L...\E lowercase text «...» NOT SUPPORTED | |
\K reset beginning of «$0» NOT SUPPORTED | |
\N{name} named Unicode character NOT SUPPORTED | |
\R line break NOT SUPPORTED | |
\U...\E upper case text «...» NOT SUPPORTED | |
\X extended Unicode sequence NOT SUPPORTED | |
\%d123 decimal character 123 NOT SUPPORTED vim | |
\%xFF hex character FF NOT SUPPORTED vim | |
\%o123 octal character 123 NOT SUPPORTED vim | |
\%u1234 Unicode character 0x1234 NOT SUPPORTED vim | |
\%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim | |
Character class elements: | |
x single character | |
A-Z character range (inclusive) | |
\d Perl character class | |
[:foo:] ASCII character class «foo» | |
\p{Foo} Unicode character class «Foo» | |
\pF Unicode character class «F» (one-letter name) | |
Named character classes as character class elements: | |
[\d] digits (== \d) | |
[^\d] not digits (== \D) | |
[\D] not digits (== \D) | |
[^\D] not not digits (== \d) | |
[[:name:]] named ASCII class inside character class (== [:name:]) | |
[^[:name:]] named ASCII class inside negated character class (== [:^name:]) | |
[\p{Name}] named Unicode property inside character class (== \p{Name}) | |
[^\p{Name}] named Unicode property inside negated character class (== \P{Name}) | |
Perl character classes (all ASCII-only): | |
\d digits (== [0-9]) | |
\D not digits (== [^0-9]) | |
\s whitespace (== [\t\n\f\r ]) | |
\S not whitespace (== [^\t\n\f\r ]) | |
\w word characters (== [0-9A-Za-z_]) | |
\W not word characters (== [^0-9A-Za-z_]) | |
\h horizontal space NOT SUPPORTED | |
\H not horizontal space NOT SUPPORTED | |
\v vertical space NOT SUPPORTED | |
\V not vertical space NOT SUPPORTED | |
ASCII character classes: | |
[[:alnum:]] alphanumeric (== [0-9A-Za-z]) | |
[[:alpha:]] alphabetic (== [A-Za-z]) | |
[[:ascii:]] ASCII (== [\x00-\x7F]) | |
[[:blank:]] blank (== [\t ]) | |
[[:cntrl:]] control (== [\x00-\x1F\x7F]) | |
[[:digit:]] digits (== [0-9]) | |
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) | |
[[:lower:]] lower case (== [a-z]) | |
[[:print:]] printable (== [ -~] == [ [:graph:]]) | |
[[:punct:]] punctuation (== [!-/:-@[-`{-~]) | |
[[:space:]] whitespace (== [\t\n\v\f\r ]) | |
[[:upper:]] upper case (== [A-Z]) | |
[[:word:]] word characters (== [0-9A-Za-z_]) | |
[[:xdigit:]] hex digit (== [0-9A-Fa-f]) | |
Unicode character class names--general category: | |
C other | |
Cc control | |
Cf format | |
Cn unassigned code points NOT SUPPORTED | |
Co private use | |
Cs surrogate | |
L letter | |
LC cased letter NOT SUPPORTED | |
L& cased letter NOT SUPPORTED | |
Ll lowercase letter | |
Lm modifier letter | |
Lo other letter | |
Lt titlecase letter | |
Lu uppercase letter | |
M mark | |
Mc spacing mark | |
Me enclosing mark | |
Mn non-spacing mark | |
N number | |
Nd decimal number | |
Nl letter number | |
No other number | |
P punctuation | |
Pc connector punctuation | |
Pd dash punctuation | |
Pe close punctuation | |
Pf final punctuation | |
Pi initial punctuation | |
Po other punctuation | |
Ps open punctuation | |
S symbol | |
Sc currency symbol | |
Sk modifier symbol | |
Sm math symbol | |
So other symbol | |
Z separator | |
Zl line separator | |
Zp paragraph separator | |
Zs space separator | |
Unicode character class names--scripts: | |
Adlam | |
Ahom | |
Anatolian_Hieroglyphs | |
Arabic | |
Armenian | |
Avestan | |
Balinese | |
Bamum | |
Bassa_Vah | |
Batak | |
Bengali | |
Bhaiksuki | |
Bopomofo | |
Brahmi | |
Braille | |
Buginese | |
Buhid | |
Canadian_Aboriginal | |
Carian | |
Caucasian_Albanian | |
Chakma | |
Cham | |
Cherokee | |
Chorasmian | |
Common | |
Coptic | |
Cuneiform | |
Cypriot | |
Cyrillic | |
Deseret | |
Devanagari | |
Dives_Akuru | |
Dogra | |
Duployan | |
Egyptian_Hieroglyphs | |
Elbasan | |
Elymaic | |
Ethiopic | |
Georgian | |
Glagolitic | |
Gothic | |
Grantha | |
Greek | |
Gujarati | |
Gunjala_Gondi | |
Gurmukhi | |
Han | |
Hangul | |
Hanifi_Rohingya | |
Hanunoo | |
Hatran | |
Hebrew | |
Hiragana | |
Imperial_Aramaic | |
Inherited | |
Inscriptional_Pahlavi | |
Inscriptional_Parthian | |
Javanese | |
Kaithi | |
Kannada | |
Katakana | |
Kayah_Li | |
Kharoshthi | |
Khitan_Small_Script | |
Khmer | |
Khojki | |
Khudawadi | |
Lao | |
Latin | |
Lepcha | |
Limbu | |
Linear_A | |
Linear_B | |
Lisu | |
Lycian | |
Lydian | |
Mahajani | |
Makasar | |
Malayalam | |
Mandaic | |
Manichaean | |
Marchen | |
Masaram_Gondi | |
Medefaidrin | |
Meetei_Mayek | |
Mende_Kikakui | |
Meroitic_Cursive | |
Meroitic_Hieroglyphs | |
Miao | |
Modi | |
Mongolian | |
Mro | |
Multani | |
Myanmar | |
Nabataean | |
Nandinagari | |
New_Tai_Lue | |
Newa | |
Nko | |
Nushu | |
Nyiakeng_Puachue_Hmong | |
Ogham | |
Ol_Chiki | |
Old_Hungarian | |
Old_Italic | |
Old_North_Arabian | |
Old_Permic | |
Old_Persian | |
Old_Sogdian | |
Old_South_Arabian | |
Old_Turkic | |
Oriya | |
Osage | |
Osmanya | |
Pahawh_Hmong | |
Palmyrene | |
Pau_Cin_Hau | |
Phags_Pa | |
Phoenician | |
Psalter_Pahlavi | |
Rejang | |
Runic | |
Samaritan | |
Saurashtra | |
Sharada | |
Shavian | |
Siddham | |
SignWriting | |
Sinhala | |
Sogdian | |
Sora_Sompeng | |
Soyombo | |
Sundanese | |
Syloti_Nagri | |
Syriac | |
Tagalog | |
Tagbanwa | |
Tai_Le | |
Tai_Tham | |
Tai_Viet | |
Takri | |
Tamil | |
Tangut | |
Telugu | |
Thaana | |
Thai | |
Tibetan | |
Tifinagh | |
Tirhuta | |
Ugaritic | |
Vai | |
Wancho | |
Warang_Citi | |
Yezidi | |
Yi | |
Zanabazar_Square | |
Vim character classes: | |
\i identifier character NOT SUPPORTED vim | |
\I «\i» except digits NOT SUPPORTED vim | |
\k keyword character NOT SUPPORTED vim | |
\K «\k» except digits NOT SUPPORTED vim | |
\f file name character NOT SUPPORTED vim | |
\F «\f» except digits NOT SUPPORTED vim | |
\p printable character NOT SUPPORTED vim | |
\P «\p» except digits NOT SUPPORTED vim | |
\s whitespace character (== [ \t]) NOT SUPPORTED vim | |
\S non-white space character (== [^ \t]) NOT SUPPORTED vim | |
\d digits (== [0-9]) vim | |
\D not «\d» vim | |
\x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim | |
\X not «\x» NOT SUPPORTED vim | |
\o octal digits (== [0-7]) NOT SUPPORTED vim | |
\O not «\o» NOT SUPPORTED vim | |
\w word character vim | |
\W not «\w» vim | |
\h head of word character NOT SUPPORTED vim | |
\H not «\h» NOT SUPPORTED vim | |
\a alphabetic NOT SUPPORTED vim | |
\A not «\a» NOT SUPPORTED vim | |
\l lowercase NOT SUPPORTED vim | |
\L not lowercase NOT SUPPORTED vim | |
\u uppercase NOT SUPPORTED vim | |
\U not uppercase NOT SUPPORTED vim | |
\_x «\x» plus newline, for any «x» NOT SUPPORTED vim | |
Vim flags: | |
\c ignore case NOT SUPPORTED vim | |
\C match case NOT SUPPORTED vim | |
\m magic NOT SUPPORTED vim | |
\M nomagic NOT SUPPORTED vim | |
\v verymagic NOT SUPPORTED vim | |
\V verynomagic NOT SUPPORTED vim | |
\Z ignore differences in Unicode combining characters NOT SUPPORTED vim | |
Magic: | |
(?{code}) arbitrary Perl code NOT SUPPORTED perl | |
(??{code}) postponed arbitrary Perl code NOT SUPPORTED perl | |
(?n) recursive call to regexp capturing group «n» NOT SUPPORTED | |
(?+n) recursive call to relative group «+n» NOT SUPPORTED | |
(?-n) recursive call to relative group «-n» NOT SUPPORTED | |
(?C) PCRE callout NOT SUPPORTED pcre | |
(?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED | |
(?&name) recursive call to named group NOT SUPPORTED | |
(?P=name) named backreference NOT SUPPORTED | |
(?P>name) recursive call to named group NOT SUPPORTED | |
(?(cond)true|false) conditional branch NOT SUPPORTED | |
(?(cond)true) conditional branch NOT SUPPORTED | |
(*ACCEPT) make regexps more like Prolog NOT SUPPORTED | |
(*COMMIT) NOT SUPPORTED | |
(*F) NOT SUPPORTED | |
(*FAIL) NOT SUPPORTED | |
(*MARK) NOT SUPPORTED | |
(*PRUNE) NOT SUPPORTED | |
(*SKIP) NOT SUPPORTED | |
(*THEN) NOT SUPPORTED | |
(*ANY) set newline convention NOT SUPPORTED | |
(*ANYCRLF) NOT SUPPORTED | |
(*CR) NOT SUPPORTED | |
(*CRLF) NOT SUPPORTED | |
(*LF) NOT SUPPORTED | |
(*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre | |
(*BSR_UNICODE) NOT SUPPORTED pcre | |