【发布时间】:2015-11-10 15:54:56
【问题描述】:
我一直在从事一个项目,该项目管理大量单词并通过大量测试来验证列表中的每个单词。有趣的是,每次我使用像 itertools 模块这样“更快”的工具时,它们似乎都变慢了。
最后我决定问这个问题,因为我可能做错了什么。以下代码将尝试测试any() 函数与循环使用的性能。
#!/usr/bin/python3
#
import time
from unicodedata import normalize
file_path='./tests'
start=time.time()
with open(file_path, encoding='utf-8', mode='rt') as f:
tests_list=f.read()
print('File reading done in {} seconds'.format(time.time() - start))
start=time.time()
tests_list=[line.strip() for line in normalize('NFC',tests_list).splitlines()]
print('String formalization, and list strip done in {} seconds'.format(time.time()-start))
print('{} strings'.format(len(tests_list)))
unallowed_combinations=['ab','ac','ad','ae','af','ag','ah','ai','af','ax',
'ae','rt','rz','bt','du','iz','ip','uy','io','ik',
'il','iw','ww','wp']
def combination_is_valid(string):
if any(combination in string for combination in unallowed_combinations):
return False
return True
def combination_is_valid2(string):
for combination in unallowed_combinations:
if combination in string:
return False
return True
print('Testing the performance of any()')
start=time.time()
for string in tests_list:
combination_is_valid(string)
print('combination_is_valid ended in {} seconds'.format(time.time()-start))
start=time.time()
for string in tests_list:
combination_is_valid2(string)
print('combination_is_valid2 ended in {} seconds'.format(time.time()-start))
前面的代码很能代表我所做的测试,如果我们看一下结果:
File reading done in 0.22988605499267578 seconds
String formalization, and list strip done in 6.803032875061035 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 80.74802565574646 seconds
combination_is_valid2 ended in 41.69514226913452 seconds
File reading done in 0.24268722534179688 seconds
String formalization, and list strip done in 6.720442771911621 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 79.05265760421753 seconds
combination_is_valid2 ended in 42.24800777435303 seconds
我发现使用循环比使用any() 快一半,这有点令人惊讶。对此有何解释?我是不是做错了什么?
(我在GNU-Linux下使用python3.4)
【问题讨论】:
-
您的测试向量是否包含任何会返回
True的字符串? -
这可能是因为生成器表达式在循环上提供了一定程度的间接性,这会减慢速度。
-
关于你所说的循环提前退出:
any也提前退出(只迭代直到一个真正的值),所以这没有区别。
标签: python performance python-3.x