对于非常简单的情况,最简单的方法可能是将命名的捕获组替换为格式字段。
这是一个基本的验证器/格式化器:
import re
from functools import partial
unescape = partial(re.compile(r'\\(.)').sub, r'\1')
namedgroup = partial(re.compile(r'\(\?P<(\w+)>.*?\)').sub, r'{\1}')
class Mould:
def __init__(self, pattern):
self.pattern = re.compile(pattern)
self.template = unescape(namedgroup(pattern))
def format(self, **values):
try:
return self.template.format(**values)
except KeyError as e:
raise TypeError(f'Missing argument: {e}') from None
def search(self, string):
try:
return self.pattern.search(string).groupdict()
except AttributeError:
raise ValueError(string) from None
因此,例如,以(XXX) YYY-ZZZZ 的形式为电话号码实例化验证器/格式化程序:
template = r'\((?P<area>\d{3})\)\ (?P<prefix>\d{3})\-(?P<line>\d{4})'
phonenum = Mould(template)
然后:
>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}
>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444
但这是一个非常基本的骨架,它忽略了许多正则表达式功能(例如环视或非捕获组)。如果需要它们,事情很快就会变得非常混乱。在这种情况下,反过来:从模板生成模式虽然更冗长,但可能更灵活且不易出错。
这里是基本的验证器/格式化器(.search() 和 .format() 相同):
import string
import re
FMT = string.Formatter()
class Mould:
def __init__(self, template, **kwargs):
self.template = template
self.pattern = self.make_pattern(template, **kwargs)
@staticmethod
def make_pattern(template, **kwargs):
pattern = ''
# for each field in the template, add to the pattern
for text, field, *_ in FMT.parse(template):
# the escaped preceding text
pattern += re.escape(text)
if field:
# a named regex capture group
pattern += f'(?P<{field}>{kwargs[field]})'
# XXX: if there's text after the last field,
# the parser will iterate one more time,
# hence the 'if field'
return re.compile(pattern)
实例化:
template = '({area}) {prefix}-{line}'
content = dict(area=r'\d{3}', prefix=r'\d{3}', line=r'\d{4}')
phonenum = Mould(template, **content)
执行:
>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}
>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444