【发布时间】:2016-01-19 01:53:05
【问题描述】:
我用highlighting functionality 配置了solr 4.10(也是5.3)。它适用于大多数单词,但是我发现一些“不允许”允许突出显示的单词,即 solr 返回所需的文档,但不突出显示其中的一些。
什么会导致这种影响?
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="wt">json</str>
<str name="indent">true</str>
<str name="defType">edismax</str>
<str name="bf">product(concount)</str>
<str name="df">text bio text_syn text_syn_other</str>
<str name="qf">
text^25 bio^16 text_syn^8 text_syn_other^3
</str>
<str name="hl">on</str>
<str name="hl.fl">text bio text_syn text_syn_other</str>
<str name="hl.preserveMulti">true</str>
<str name="hl.encoder">html</str>
<str name="f.text.hl.fragsize">100</str>
<str name="hl.snippets">20</str>
<arr name="components">
<str>highlight</str>
</arr>
</lst>
schema.xml
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" />
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" />
<field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" />
<field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" />
<field name="concount" type="long" indexed="true" stored="true" multiValued="false" />
<field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" />
<copyField source="text" dest="text_syn"/>
<copyField source="bio" dest="text_syn"/>
<copyField source="text" dest="text_syn_other"/>
<copyField source="bio" dest="text_syn_other"/>
对于查询 http://localhost:8983/solr/select?q=senior,我得到了包含单词 senior 的文档,但在 solr 响应的突出显示部分中,该单词未突出显示。
更新 1:
我发现我的synonyms_abbr.txt 文件中有senior 这个词,senior,lead 行。当我评论那行或替换单词的位置时,lead,senior,令人惊讶的是,senior 开始突出显示。有什么想法吗?
更新 2:
来自synonyms.txt 和synonyms_other.txt 的单词会正常突出显示,但来自synonyms_abbr.txt 的单词会出现如下奇怪的行为。例如,我在synonyms_abbr.txt 中有行lead,head,senior 然后
- 查询
http://localhost:8983/solr/select?q=senior和http://localhost:8983/solr/select?q=head没有突出显示任何单词, - 查询
http://localhost:8983/solr/select?q=lead不仅突出显示单词lead,还突出显示head和senior。
【问题讨论】:
-
请使用 Solr 后端功能来分析单词的转换。我不确定这个词的转换方式。这可能是一个阻塞问题。否则,请使用不同的字段,关闭仅保留标记器的转换,然后尝试从该字段突出显示。
-
@Mher 是没有突出显示停用词的词吗?还是只是随机的?
-
我没有任何停用词配置。整个
stopwords.txt文件都被注释了。 -
你试过
expand=true吗?