Tuesday, September 7, 2010

Python re multiline match (inline)

Compilation Flags -- inline

Python regex flags effect matching. For example: re.MULTILINE, re.IGNORECASE, re.DOTALL. Unfortunately, passing these flags is awkward if want to put regex patterns in a config file or database or otherwise want to have the user be able to enter in regex patterns. You don't want to have to make the user pass in flags separately. Luckily, you can pass flags in the pattern itself. This is a very poorly documented feature of Python regular expressions. At the start of the pattern add flags like this:
(?i) for re.IGNORECASE
 (?L) for re.LOCALE (Make \w, \W, \b, \B, \s and \S dependent on the current locale.)
 (?m) for re.MULTILINE  (Makes ^ and $ match before and after newlines)
 (?s) for re.DOTALL  (Makes . match newlines)
 (?u) for re.UNICODE (Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character properties database.)
 (?x) for re.VERBOSE
For example, the following pattern ignores case (case insensitive):
re.search ("(?i)password", string)
The flags can be combined. The following ignores case and matches DOTALL:
re.search ("(?is)username.*?password", string)