在弹性搜索中火花写入时间戳答案

【问题标题】：spark write timestamp as long in elasticsearch在弹性搜索中火花写入时间戳
【发布时间】：2020-05-11 15:11:50
【问题描述】：

我从 jdbc 源读取数据并将其直接写入弹性搜索索引。当我在 ES 中查询数据时，我看到我的数据框中的所有时间戳字段都转换为 long

在下面查看我的代码

 val appName="ExractToolEngine"
 val master = "local[2]"
 val conf = new SparkConf().setAppName(appName).setMaster(master)
 conf.set("es.write.operation", "index")
 conf.set("es.mapping.id", "user_id")
 conf.set("index.mapper.dynamic", "true")
 conf.set("es.mapping.rich.date", "true")

  def main(args: Array[String]): Unit = {
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._

    val srcData = sqlContext.read.format("jdbc").
      options(Map("driver"->"com.jdbc.Driver",
      "url" -> "jdbc...",
      "dbtable"-> "tbl",
      "partitionColumn"-> "user_id",
      "lowerBound"-> "1",
      "upperBound"-> "1000000",
      "numPartitions"-> "50"
      )
).load()
    srcData.filter("user_id>=1 and user_id<=1000000").saveToEs("test_users/sm_1")

}

当我运行srcData.printSchema()

我明白了：

|-- dwh_insert_ts: timestamp (nullable = true)
|-- dwh_update_ts: timestamp (nullable = true)

当我查询http://localhost:9200/test_users/_mapping/sm_1上的索引映射时

我明白了

"properties": {
"dwh_insert_ts": {
"type": "long"
},
"dwh_update_ts": {
"type": "long"
},

是否有办法强制弹性搜索保持时间戳并进行转换？

【问题讨论】：

但时间戳是长，不是吗？
spark timestamp 表示 datetime 对象，我可以应用 datetime 函数而无需将其从 long 转换（类似于 mysql 和 postgres 等其他 RDBMS）

标签： elasticsearch apache-spark spark-dataframe

【解决方案1】：

您可以使用多种日期格式 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

【讨论】：

我需要提前预定义我所有的时间戳字段吗？我的架构可以改变，我不想预定义我的所有字段都在它周围吗？
您不需要定义所有字段，只定义日期字段，如果您在日期字段的名称中具有特定的名称模式，例如“created_date”、“updated_date”，您也可以使用动态映射。
根据this我无法动态映射到时间戳文件
这是另一个“动态”映射，请查看elastic.co/guide/en/elasticsearch/reference/current/…

【解决方案2】：

您可以查看以下ES doc page。

在我看来，您的配置是错误且无用的：

conf.set("es.mapping.rich.date", "true")

正确的名称定义在here:

es.mapping.date.rich

由于它默认为true，你可能不需要它。

【讨论】：

【解决方案3】：

我遇到了同样的问题，我已经解决了。将时间戳转换为 UTC 格式，例如“2020-05-11T14:44:24.000+08:00”，这是亚洲/上海时间。然后写入es，es会将其映射为日期类型。

【讨论】：