【问题标题】:Vega visualizations for kibana - aggregations and accessing the document fieldskibana 的 Vega 可视化 - 聚合和访问文档字段
【发布时间】:2018-04-04 12:58:55
【问题描述】:

我也是 Vega 和 Kibana 的新手,我试图创建一个散点图来显示主题标签及其平均极性,但是我遇到了两个方面的问题,首先是聚合平均极性,其次是从文档中访问主题标签文本字段。

我试图获得平均极性的代码(现在只是在时间尺度上显示):

      {$schema: https://vega.github.io/schema/vega-lite/v2.json
  data: {
    # URL object is a context-aware query to Elasticsearch
    url: {
      # The %-enclosed keys are handled by Kibana to modify the query
      # before it gets sent to Elasticsearch. Context is the search
      # filter as shown above the dashboard. Timefield uses the value 
      # of the time picker from the upper right corner.
      %context%: true
      %timefield%: timestamp
      index: tw
      body: {
        size: 10000
        _source: ["timestamp", "user_lang", "country", "polarity", "lang", "sentiment"]
      }
    }
    # We only need the content of hits.hits array
    format: {property: "hits.hits"}
  }
  # Parse timestamp into a javascript date value
  transform: [
    {calculate: "toDate(datum._source['timestamp'])", as: "time"}
  ]
  # Draw a circle, with x being the time field, and y - number of bytes
  mark: line
  encoding: {
    x: {field: "time", type: "temporal"}
    y: {aggregate: "mean", field: "_source.polarity", type: "quantitative"}
  }
}

这给了我一个错误,无法读取未定义的属性“极性”。一旦我摆脱聚合它就可以工作,但我想显示平均而不是所有数据。

另外,我不知道如何访问嵌套的主题标签文本字段,我试过 _source.hashtags.text 但没有用:

示例文档:

{
        "_index": "tw",
        "_type": "tweet",
        "_id": "_HHWSGIBbYt8wc5TlB8B",
        "_score": 1,
        "_source": {
          "lang": "en",
          "favorited": false,
          "sentiment": "positive",
          "user_lang": "en",
          "user_screenname": "BrideWiltshire",
          "timestamp": "2018-03-21T13:54:04.928556",
          "user_follow_count": 147,
          "hashtags": [
            {
              "indices": [
                8,
                12
              ],
              "text": "WIN"
            }
          ],
          "user_stat_count": 3377,
          "user_fav_count": 11,
          "coordinates": null,
          "source": """<a href="https://panel.socialpilot.co/" rel="nofollow">SocialPilot.co</a>""",
          "subjectivity": 0.3333333333333333,
          "user_friends_count": 62,
          "polarity": 0.5333333333333333,
          "text": "Want to #WIN ‘His and Hers’ luggage labels from @DavidHampton, worth more than £100? Enter our competition now",
          "message": "Want to #WIN ‘His and Hers’ luggage labels from @DavidHampton, worth more than £100? Enter our competition now",
          "country": null,
          "user_name": "Wiltshire Bride",
          "favorite_count": 0
        }
      },

映射:

{
  "tw": {
    "mappings": {
      "tweet": {
        "properties": {
          "coordinates": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "country": {
            "type": "keyword"
          },
          "favorite_count": {
            "type": "long"
          },
          "favorited": {
            "type": "boolean"
          },
          "hashtags": {
            "properties": {
              "indices": {
                "type": "long"
              },
              "text": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "lang": {
            "type": "text"
          },
          "location": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "message": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "polarity": {
            "type": "float"
          },
          "sentiment": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "source": {
            "type": "text"
          },
          "subjectivity": {
            "type": "float"
          },
          "text": {
            "type": "text"
          },
          "time_zone": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "timestamp": {
            "type": "date"
          },
          "user": {
            "properties": {
              "favourites_count": {
                "type": "long"
              },
              "followers_count": {
                "type": "long"
              },
              "friends_count": {
                "type": "long"
              },
              "lang": {
                "type": "text"
              },
              "name": {
                "type": "text"
              },
              "screen_name": {
                "type": "text"
              },
              "statuses_count": {
                "type": "long"
              }
            }
          },
          "user_fav_count": {
            "type": "long"
          },
          "user_follow_count": {
            "type": "long"
          },
          "user_friends_count": {
            "type": "long"
          },
          "user_lang": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "user_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "user_screenname": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "user_stat_count": {
            "type": "long"
          }
        }
      }
    }
  }
}

【问题讨论】:

  • 我试图重现,你的例子对我来说似乎表现得很好。 VEGA_DEBUG.view.data('source_0') 提供什么?
  • ive 尝试使用调试工具,@StevenEnsslen 但我发现它令人困惑,因为 Kibana 显示错误但是该工具似乎没有帮助它,这就是我从运行原始代码时得到的问题:`VEGA_DEBUG.view.data('source_0') (2491) [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{… }、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、 {…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、{…}、…]` .js?v=16588:58 XHR 完成加载:POST "x.x.x.x:5601/elasticsearch/tw/_search"。`
  • 这是开发人员控制台@Angelika 中大型数组的标准行为。虽然我已经放弃尝试为它寻找参考。如果您扩展各个范围,您应该会看到您的对象。它们的格式应该像0: Object { timestamp:"2018-03-21T13:54:04.928556", user_lang:"en", polarity: 0.5333333333333333, lang:"en", sentiment:"positive"} 通常的问题是format 不完整或者列有不同的名称,根据我的经验,这两个通常都是输入错误。

标签: javascript data-visualization kibana vega vega-lite


【解决方案1】:

如果您的主题标签字段是嵌套类型并且 hashtags.text 是关键字字段(或具有 hashtags.text.keyword),那么您可以使用以下散点图

{
  $schema: https://vega.github.io/schema/vega-lite/v2.json
  title: hashtags vs avg_polarity
  data: {
    url: {
      index: twitter
      body: {
        size: 0
        query: {
          match_all: {}
        }
        aggs: {
          HashTags: {
            nested: {path: "hashtags"}
            aggs: {
              HashTags_Text: {
                terms: {field: "hashtags.text"}
                aggs: {
                  Tweet_Polarity: {
                    reverse_nested: {}
                    aggs: {
                      Tweet_Polarity_avg: {
                        avg: {field: "polarity"}
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    format: {property: "aggregations.HashTags.HashTags_Text.buckets"}
  }
  mark: {type: "line"}
  encoding: {
    x: {
      field: key
      type: Nominal
      axis: {title: "HashTags"}
    }
    y: {
      field: Tweet_Polarity.Tweet_Polarity_avg.value
      type: quantitative
      axis: {title: "polarity"}
    }
  }
}

有趣的小插图 编辑

在开始添加文档之前,您必须如下指定索引映射

POST /tw
{
"mappings": {
            "tweet": {
                "properties": {
                    "favorite_count": {
                        "type": "long"
                    },
                    "favorited": {
                        "type": "boolean"
                    },
                    "hashtags": {
                        "type": "nested",
                        "properties": {
                            "indices": {
                                "type": "long"
                            },
                            "text": {
                                "type": "keyword"
                            }
                        }
                    },
                    "lang": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "message": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "polarity": {
                        "type": "float"
                    },
                    "sentiment": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "source": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "subjectivity": {
                        "type": "float"
                    },
                    "text": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "timestamp": {
                        "type": "date"
                    },
                    "user_fav_count": {
                        "type": "long"
                    },
                    "user_follow_count": {
                        "type": "long"
                    },
                    "user_friends_count": {
                        "type": "long"
                    },
                    "user_lang": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "user_name": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "user_screenname": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "user_stat_count": {
                        "type": "long"
                    }
                }
            }
        }
}

【讨论】:

  • 将标记类型更改为从线开始的圆圈以实现实际分散
  • 感谢您的回复@sramalingam24,我已经尝试了上面的代码,我得到了 [aggregation_execution_exception] [nested] 嵌套路径 [hashtags] 不是嵌套错误?原因可能是某些文档没有主题标签吗?
  • 似乎您的推文类型中的主题标签字段未映射为嵌套类型。您必须在创建 tw 索引并设置推文类型的映射时明确指定它。
  • 哦,所以我不能这样做,因为我在该索引中已经有大约 1000 万个文档,我相信我必须启动新索引,因为我知道它不可能更新索引映射?
  • @Angelika 您可以使用 _reindex api 将您的文档复制到具有上述映射的新索引中,或者您可以添加 hashtags 字段的副本(称为 hashtags_copy),并将嵌套映射应用于复制的使用 _update_by_query api 的字段。至于查询不起作用,您将需要嵌套映射才能使 reverse_nested 查询起作用
猜你喜欢
  • 1970-01-01
  • 2019-01-24
  • 2019-09-01
  • 2021-01-23
  • 2020-02-14
  • 2020-09-15
  • 2017-01-30
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多