【问题标题】:Elasticsearch : How to get count of child nested objects with parent field?Elasticsearch:如何获取具有父字段的子嵌套对象的计数?
【发布时间】:2019-01-21 09:52:16
【问题描述】:

我有一种从弹性搜索中检索数百万条记录的方案。

我是弹性搜索的初学者,不能非常有效地使用弹性搜索。

我在弹性搜索中为作者模型编制索引,如下所示,我正在使用 NEST 客户端将弹性搜索与 .net 应用程序结合使用。

下面我将解释我的模型。

Author
--------------------------------
AuthorKey           string
List<Study>         Nested


Study
---------------------------------
PMID              int
PublicationDate   date
PublicationType   string
MeshTerms         string
Content           string

我们有近 1000 万作者,每位作者完成了至少 3 项研究。

因此,弹性索引中大约有 3000 万条记录可用。

现在我想获取作者数据及其总研究数

以下是示例 JSON 数据:

{
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "Study": [
        {
          "PMId": 1000,
          "PublicationDate": "2019-01-17T06:35:52.178Z",
          "content": "this is dummy content.how can i solve this",
          "MeshTerms": "karan,dharan,nilesh,manan,mehul sir,manoj",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1001,
          "PublicationDate": "2019-01-16T05:55:14.947Z",
          "content": "this is dummy content.how can i solve this",
          "MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "content": "this is dummy content for record2.how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 1003,
          "PublicationDate": "2011-01-15T05:55:14.947Z",
          "content": "this is dummy content for record3.how can i solve this",
          "MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "Study": [

        {
          "PMId": 2001,
          "PublicationDate": "2011-01-16T05:55:14.947Z",
          "content": "this is dummy content for author 2.how can i solve 
           this",
          "MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 2002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "content": "this is dummy content for author 2.how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 2003,
          "PublicationDate": "2015-01-15T05:55:14.947Z",
          "content": "this is dummy content for record2.how can i solve 
           this",
          "MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    },
    {
      "AuthorKey": "Author3",
      "AuthorName": "Nilesh",
      "AuthorLastName": "Mistrey",
      "Study": [
        {
          "PMId": 3000,
          "PublicationDate": "2012-01-16T05:55:14.947Z",
          "content": "this is dummy content for author 2 .how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul sir2,manoj2",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        }

  ]
}

如何按降序检索所有作者及其研究总数?

预期输出:

{
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "StudyCount": 4
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "StudyCount": 3
    },

    {
      "AuthorKey": "Author3",
      "AuthorName": "Nilesh",
      "AuthorLastName": "Mistrey",
      "StudyCount": 1
    }
  ]
}

下面是索引的映射:

{
  "authorindex": {
    "mappings": {
      "_doc": {
        "properties": {
          "AuthorKey": {
            "type": "keyword"
          },
          "AuthorLastName": {
            "type": "keyword"
          },
          "AuthorName": {
            "type": "keyword"
          },
          "Study": {
            "type": "nested",
            "properties": {
              "MeshTerms": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "PMId": {
                "type": "long"
              },
              "PublicationDate": {
                "type": "date"
              },
              "PublicationType": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "content": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

【问题讨论】:

  • 能否提供您正在使用的映射?您是否已经尝试解决问题?怎么样?
  • @NikolayVasiliev 我已经尝试过但没有得到如何编写查询来满足这个要求
  • 请不要在 cmets 中添加类似的内容,使用 edit 链接

标签: elasticsearch querydsl elasticsearch-query


【解决方案1】:

有几个选项可以解决这个问题。

  1. answer 中建议使用类似脚本的类似问题;

  2. 预先计算所需的研究数量,将其作为简单整数存储在索引中并对结果进行排序。

根据您所面临的情况,任何一个选项都适合您。

如果您需要试验数据并进行随意查询,选项 1) 就可以了。它不是高性能的,但应该适用于现有的数据和映射。

选项 2) 在将数据发送到 Elasticsearch 之前需要完整的重新索引并添加一个额外的(但简单的)步骤。从积极的方面来说,这将保证最佳性能。

您可以在权威指南的Handling relationships 章节中了解 Elasticsearch 中处理关系的其他方法。

希望有帮助!

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-13
    • 1970-01-01
    相关资源
    最近更新 更多