Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why does using parenthesis in a POSIX regular expression change the result of a match?

April 26, 2017expression parenthesis POSIX regular

0

Posted

Why does using parenthesis in a POSIX regular expression change the result of a match?

1 Answer

0

Posted

A. For POSIX (extended and basic) regular expressions, but not for perl regexes, parentheses don’t only mark; they determine what the best match is as well. When the expression is compiled as a POSIX basic or extended regex then Boost.Regex follows the POSIX standard leftmost longest rule for determining what matched. So if there is more than one possible match after considering the whole expression, it looks next at the first sub-expression and then the second sub-expression and so on. So… “(0*)([0-9]*)” against “00123” would produce $1 = “00” $2 = “123” where as “0*([0-9])*” against “00123” would produce $1 = “00123” If you think about it, had $1 only matched the “123”, this would be “less good” than the match “00123” which is both further to the left and longer. If you want $1 to match only the “123” part, then you need to use something like: “0*([1-9][0-9]*)” as the expression.