【发布时间】:2020-10-30 04:17:30
【问题描述】:
我有一个存储字符串数组的字段。不同的文档包含不同的字符串集。
ex: "ftypes": ["PDF", "TXT", "XML"]
现在我使用这个聚合查询来分析每个文件类型的使用情况。
{
"aggs": {
"list": {
"terms": {
"field": "ftypes",
"min_doc_count": 0,
"size": 100000
}
}
}
}
result ==>
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 137265,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"list": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "PDF",
"doc_count": 134475
},
{
"key": "TXT",
"doc_count": 21312
},
{
"key": "XML",
"doc_count": 6597
},
{
"key": "JPG",
"doc_count": 1233
}
]
}
}
}
结果如预期的那样正确。但最近我在删除 XML 文件支持后更新了这个字段。所以文档中没有文件类型为 XML。我可以从这个查询中确认。
{
"query": {
"terms": {
"ftypes": ["XML"]
}
}
}
result ===>
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
总点击数为零。奇怪的是,当我再次执行上述聚合查询时,我可以将 XML 视为一个术语。文档计数为零。
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 137265,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"list": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "PDF",
"doc_count": 134475
},
{
"key": "TXT",
"doc_count": 21312
},
{
"key": "JPG",
"doc_count": 1233
},
{
"key": "XML",
"doc_count": 0
}
]
}
}
}
如果它在任何文档中都不存在,那么这个 XML 术语现在来自哪里?是否有需要删除的缓存?
【问题讨论】:
-
你能添加每个查询的输出吗?
-
@Gibbs 结果现已包含在内
-
请查看此link。也许这会有所帮助。
标签: elasticsearch