注意考虑到这个问题的收视率,我针对不同类型的测试用例取消删除并重写了它
我从答案中考虑了四个相互竞争的实现
>>> def sub_noregex(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses generator expression
"""
return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))
>>> def sub_regex(hay):
"""
This is a straight forward regex implementation as suggested by @mgilson
Note, so that the overheads doesn't add to the cummulative sum, I have placed
the regex creation routine outside the function
"""
return re.sub(regex,lambda m:sub_dict[m.group()],hay)
>>> def sub_temp(hay, _uuid = str(uuid4())):
"""
Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
value to reduce collission
"""
hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
return hay
>>> def sub_noregex_LC(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses List Comprehension
"""
return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])
一个广义的timeit函数
>>> def compare(n, hay):
foo = {"sub_regex": "re",
"sub_noregex":"",
"sub_noregex_LC":"",
"sub_temp":"",
}
stmt = "{}(hay)"
setup = "from __main__ import hay,"
for k, v in foo.items():
t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
yield t.timeit(n)
以及通用测试例程
>>> def test(*args, **kwargs):
n = kwargs['repeat']
print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
"sub_noregex ", "sub_regex",
"sub_noregex_LC ")
for hay in args:
hay, hay_str = hay
print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))
测试结果如下
>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
repeat = 100000)
Test Case sub_temp sub_noregex sub_regex sub_noregex_LC
Multiple repeatation of search key 0.2022748797 0.3517142003 0.4518992298 0.1812594258
Single repeatation of search key at the end 0.2026047957 0.3508259952 0.4399926194 0.1915298898
Single repeatation of at either end 0.1877455356 0.3561734007 0.4228843986 0.2164233388
Single repeatation for smaller string 0.2061019057 0.3145984487 0.4252060592 0.1989413449
>>>
根据测试结果
Non Regex LC 和 temp 变量替换具有更好的性能,但 temp 变量的使用性能并不一致
LC 版本的性能优于生成器(已确认)
正则表达式的速度要慢两倍以上(因此,如果这段代码是瓶颈,则可以重新考虑实现更改)
Regex 和非 regex 版本同样健壮并且可以扩展