How can I strip all the html tags from a document with a Perl substitute?
Here is a simple regular expression that will strip HTML tags: $line =~ s/<(([^ >]|\n)*)>//g; Or you can “escape” certain characters in a HTML tag so that it can be displayed: $line =~ s/<(([^>]|\n)*)>/<$1>/g; For more information, see Tom’s striphtml program, which is also included in his tour of perl5 regexps.