【发布时间】:2021-04-10 06:10:31
【问题描述】:
我的程序已经接近完成我想要它做的事情,但我有一个问题:我试图找到的许多关键字可能在中间有符号或者可能拼写错误。因此,我想将拼写错误的单词算作关键字匹配,就好像它们拼写正确一样。例如,假设我的文字是:“settlement settl#7*nt se##tl#ment ann&&ity annuity。”
我想计算 .txt 文件中包含关键字“settlement”和“annuity”的次数,以及以“sett”开头并以“nt”结尾的单词作为“settlement”以及以“ann”开头的单词的次数并以“y”结尾作为年金。
我已经能够计算出准确的单词,并且非常接近我想要它做的事情。但现在我想做近似匹配。我什至不确定这是可能的。谢谢。
out1 = open("seen.txt", "w")
out2 = open("missing.txt", "w")
def count_words_in_dir(dirpath, words, action=None):
for filepath in glob.iglob(os.path.join("/Settlement", '*.txt')):
with open(filepath) as f:
data = f.read()
for key, val in words.items():
# print("key is " + key + "\n")
ct = data.count(key)
words[key] = ct
if action:
action(filepath, words)
def print_summary(filepath, words):
for key, val in sorted(words.items()):
whichout = out1 if val > 0 else out2
print(filepath, file=whichout)
print('{0}: {1}'.format(key, val), file=whichout)
filepath = sys.argv[1]
keys = ["annuity", "settlement"]
words = dict.fromkeys(keys, 0)
count_words_in_dir(filepath, words, action=print_summary)
out1.close()
out2.close()
【问题讨论】: