How do I match XML, HTML, or other nasty, ugly things with a regex?
(contributed by brian d foy) If you just want to get work done, use a module and forget about the regular expressions. The XML::Parser and HTML::Parser modules are good starts, although each namespace has other parsing modules specialized for certain tasks and different ways of doing it. Start at CPAN Search ( http://search.cpan.org ) and wonder at all the work people have done for you already! 🙂 The problem with things such as XML is that they have balanced text containing multiple levels of balanced text, but sometimes it isn’t balanced text, as in an empty tag (
, for instance). Even then, things can occur out-of-order. Just when you think you’ve got a pattern that matches your input, someone throws you a curveball. If you’d like to do it the hard way, scratching and clawing your way toward a right answer but constantly being disappointed, beseiged by bug reports, and weary from the inordinate amount of time you have to spend reinventing a triangular wheel, then there are sever