A quantifier can be appended to a character, a character class, or a [...]
set. It specifies how many count of preceding item we need.
* | 0 or more. This is same as {0,} . |
+ | 1 or more. This is same as {1,} . |
? | 0 or 1. This is same as {0, 1} . |
❴2❵ | Exactly 2 times |
❴2,8❵ | Between 2 and 8 times |
❴2,❵ | 2 or more times |
o*d
matches zero or more o
followed by a d
. Notice here the character d
of the word would
and could
are also matched as they have zero o
before it.
o{0,}d
is equivalent to this pattern.
How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood As a woodchuck would if a woodchuck could chuck wood
match | position |
---|---|
ood | 10 |
d | 18 |
ood | 23 |
ood | 44 |
d | 57 |
ood | 66 |
d | 78 |
d | 94 |
d | 115 |
d | 120 |
ood | 137 |
ood | 147 |
d | 160 |
ood | 168 |
d | 181 |
ood | 190 |
o+d
matches one or more o
followed by a d
.
o{1,}d
is equivalent to this pattern.
How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood As a woodchuck would if a woodchuck could chuck wood
match | position |
---|---|
ood | 10 |
ood | 23 |
ood | 44 |
ood | 66 |
ood | 137 |
ood | 147 |
ood | 168 |
ood | 190 |
o?d
matches zero or one o
followed by a d
. Notice here the first o
of the word wood is never matched.
o{0,1}d
is equivalent to this pattern.
How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood As a woodchuck would if a woodchuck could chuck wood
match | position |
---|---|
od | 11 |
d | 18 |
od | 24 |
od | 45 |
d | 57 |
od | 67 |
d | 78 |
d | 94 |
d | 115 |
d | 120 |
od | 138 |
od | 148 |
d | 160 |
od | 169 |
d | 181 |
od | 191 |
\b\d{3}\b
matches exact 3-digit integers.
2 6 12 34 15334 75 102 800 9 1200 35325 450
match | position |
---|---|
102 | 19 |
800 | 23 |
450 | 40 |
\b\d{1,3}\b
matches digits upto 3 digits.
Note: \b\d{1, 3}\b
will not work as there should not be any space after the comma between the numbers.
2 6 12 34 15334 75 102 800 9 1200 35325 450
match | position |
---|---|
2 | 0 |
6 | 2 |
12 | 4 |
34 | 7 |
75 | 16 |
102 | 19 |
800 | 23 |
9 | 27 |
450 | 40 |
\b\d{3,}\b
matches those integers that are 3 or above number of digits.
2 6 12 34 15334 75 102 800 9 1200 35325 450
match | position |
---|---|
15334 | 10 |
102 | 19 |
800 | 23 |
1200 | 29 |
35325 | 34 |
450 | 40 |
By default, the regexp engine tries to repeat the quantifiers *
and +
as many times as possible. This behavior is called greedy as they try to match as much of the text as possible. The opposite of this behavior is lazy mode.
Let's understand this behavior with an example.
Using quantifiers, let's get the contents present inside the ()
from the given text.
With the default greedy behavior, the regexp machine tries to match all between the first (
and the last )
. This is not what we want.
Draw a line graph from (1,6) to (2,4) to (3,2) and finally end at (10,12). Note: The xy coordinates are represented as (x,y). This means (starting point, ending point).
match | position |
---|---|
(1,6) to (2,4) to (3,2) and finally end at (10,12) | 23 |
(x,y). This means (starting point, ending point) | 119 |
Add ?
to the quantifier to make it non greedy. This way we get the desired result.
Note: ?
is also a quantifier which matches its preceding item zero or once. But if it is added to another quantifier or even itself, it gets a different meaning. It changes from the default greedy to lazy mode. *
, +
and ?
operates in greedy mode. *?
, +?
and ??
operates in lazy (non greedy) mode.
Draw a line graph from (1,6) to (2,4) to (3,2) and finally end at (10,12). Note: The xy coordinates are represented as (x,y). This means (starting point, ending point).
match | position |
---|---|
(1,6) | 23 |
(2,4) | 32 |
(3,2) | 41 |
(10,12) | 66 |
(x,y) | 119 |
(starting point, ending point) | 137 |