【问题标题】:QueryParser.parse, restrict the distance between wordsQueryParser.parse,限制单词之间的距离
【发布时间】:2019-04-25 02:15:55
【问题描述】:

我正在使用 whoosh 包进行全文模糊匹配。

我目前的代码如下:

from whoosh.index import create_in
from whoosh.fields import *
from whoosh.query import FuzzyTerm


class MyFuzzyTerm(FuzzyTerm):
    def __init__(self, fieldname, text, boost=1.0, maxdist=2, prefixlength=1, constantscore=True):
        super(MyFuzzyTerm, self).__init__(fieldname, text, boost, maxdist, prefixlength, constantscore)


if not os.path.exists("indexdir"):
    os.mkdir("indexdir")

path = u"MMM2.txt"
content = open('MMM2.txt', 'r').read()

schema = Schema(name=TEXT(stored=True), content=TEXT)
ix = create_in("indexdir", schema)
writer = ix.writer()
writer.add_document(name=path, content= content)
writer.commit()

from whoosh.qparser import QueryParser, FuzzyTermPlugin, PhrasePlugin, SequencePlugin

with ix.searcher() as searcher:
    parser = QueryParser(u"content", ix.schema,termclass = MyFuzzyTerm)
    parser.add_plugin(FuzzyTermPlugin())
    parser.remove_plugin_class(PhrasePlugin)
    parser.add_plugin(SequencePlugin())
    str = u"Tennessee Riverkeeper Inc"
    query = parser.parse(str)
    # query = parser.parse(u"\"Tennessee Riverkeeper Inc\"~")
    results = searcher.search(query)
    print ("nb of results =", len(results),results, type(results))
    for r in results:
        print (r)

在文档 MMM2.txt 中,它包含以下文本:“Tennessee aa Riverkeeper aa aa Inc”。理想情况下,我希望程序返回 0,因为我想将术语中单词之间的距离限制在 1 以内。但是,它仍然返回:

nb of results = 1 <Top 1 Results for And([MyFuzzyTerm('content', 'tennessee', boost=1.000000, maxdist=2, prefixlength=1), MyFuzzyTerm('content', 'riverkeeper', boost=1.000000, maxdist=2, prefixlength=1), MyFuzzyTerm('content', 'inc', boost=1.000000, maxdist=2, prefixlength=1)]) runtime=0.009658594451408662> <class 'whoosh.searching.Results'>
<Hit {'name': 'MMM2.txt'}>

但是,如果我替换:

query = parser.parse(str)

与:

query = parser.parse(u"\"Tennessee Riverkeeper Inc\"~")

因为我想返回不匹配的结果,所以它起作用了。我想这与“〜”有关。但是当我将字符串替换为变量名时,我无法添加它。由于要匹配的字符串太多,我无法一一键入。我只能在循环中每次将它们存储到变量中。有没有办法解决这个问题?

非常感谢您提前提供的帮助!

【问题讨论】:

    标签: python python-3.x whoosh fuzzy


    【解决方案1】:

    我知道怎么做:

    只是改变:

    query = parser.parse('"%s"~' % str)
    

    希望它可以帮助某人!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-09-30
      • 2021-06-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-09-28
      • 2013-04-28
      相关资源
      最近更新 更多