We need to use regular expression frequently in text processing for search, parse, validation or XML document integrity. Java provide us a package called java.util.regex to make life easier for regular expression. Bellow I have summarized the things as I need to use Java Regex so frequently
A simple example for case insensitive URL matching using java Regex given bellow:
Common matching symbols:
| Regular Expression | Description |
|---|---|
| . | Matches any sign |
| ^regex | regex must match at the beginning of the line |
| regex$ | Finds regex must match at the end of the line |
| [abc] | Set definition, can match the letter a or b or c |
| [abc[vz]] | Set definition, can match a or b or c followed by either v or z |
| [^abc] | When a “^” appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c |
| [a-d1-7] | Ranges, letter between a and d and figures from 1 to 7, will not match d1 |
| X|Z | Finds X or Z |
| XZ | Finds X directly followed by Z |
| $ | Checks if a line end follows |
Metacharacters:
| Regular Expression | Description |
|---|---|
| \d | Any digit, short for [0-9] |
| \D | A non-digit, short for [^0-9] |
| \s | A whitespace character, short for [ \t\n\x0b\r\f] |
| \S | A non-whitespace character, for short for [^\s] |
| \w | A word character, short for [a-zA-Z_0-9] |
| \W | A non-word character [^\w] |
| \S+ | Several non-whitespace characters |
Characters:
| Characters | Description |
|---|---|
| x | The character x |
\\ | The backslash character |
n | The character with octal value 0n (0<=n<=7) |
| nn | The character with octal value 0nn (0<=n<=7) |
| mnn | The character with octal value 0mnn (0<=m<=3, 0<=n<=7) |
\xhh | The character with hexadecimal value 0xhh |
\uhhhh | The character with hexadecimal value 0xhhhh |
\t | The tab character ('\u0009') |
\n | The newline (line feed) character ('\u000A') |
\r | The carriage-return character ('\u000D') |
\f | The form-feed character ('\u000C') |
\a | The alert (bell) character ('\u0007') |
\e | The escape character ('\u001B') |
\cx | The control character corresponding to x |
Quantifier:
| Regular Expression | Description | Examples |
|---|---|---|
| * | Occurs zero or more times, is short for {0,} | X* – Finds no or several letter X, .* – any character sequence |
| + | Occurs one or more times, is short for {1,} | X+ – Finds one or several letter X |
| ? | Occurs no or one times, ? is short for {0,1} | X? -Finds no or exactly one letter X |
| {X} | Occurs X number of times, {} describes the order of the preceding liberal | \d{3} – Three digits, .{10} – any character sequence of length 10 |
| {X,Y} | .Occurs between X and Y times, | \d{1,4}- \d must occur at least once and at a maximum of four |
| *? | ? after a qualifier makes it a “reluctant quantifier”, it tries to find the smallest match. |
No comments:
Post a Comment