【问题标题】:Scala : Read Array value in Elasticsearch with SparkScala:使用 Spark 在 Elasticsearch 中读取数组值
【发布时间】:2018-07-04 18:34:54
【问题描述】:

我正在尝试从 Elasticsearch 读取数据,但我想要读取的文档包含一个嵌套数组(我想要读取)。

我通过以下方式包含选项“es.read.field.as.array.include”:

val dataframe = reader
            .option("es.read.field.as.array.include","arrayField")
            .option("es.query", "someQuery")
            .load("Index/Document")

但是报错了

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to java.lang.Float

我应该如何映射我的数组来读取它?

来自 ES 的数据样本:

{
    "_index": "Index",
    "_type": "Document",
    "_id": "ID",
    "_score": 1,
    "_source": {
        "currentTime": 1516211640000,
        "someField": someValue,
        "arrayField": [
        {
            "id": "000",
            "field1": 14,
            "field2": 20.23871387052084,
            "innerArray": [[ 55.2754,25.1909],[ 55.2754,25.190929],[ 55.27,25.190]]
        }, ...
        ],
    "meanError": 0.3082,

    }
}

【问题讨论】:

    标签: scala apache-spark elasticsearch


    【解决方案1】:

    您的示例数据内部数组需要是 2 个数组列

    你可以试试这个采样

    val es = spark.read.format("org.elasticsearch.spark.sql")
      .option("es.read.field.as.array.include","arrayField,arrayField.innerArray:2")
      .option("es.query", "someQuery")
      .load("Index/Document")
    
     |-- arrayField: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- field1: long (nullable = true)
     |    |    |-- field2: float (nullable = true)
     |    |    |-- id: string (nullable = true)
     |    |    |-- innerArray: array (nullable = true)
     |    |    |    |-- element: array (containsNull = true)
     |    |    |    |    |-- element: float (containsNull = true)
     |-- currentTime: long (nullable = true)
     |-- meanError: float (nullable = true)
     |-- someField: string (nullable = true)
    
    
     +--------------------+-------------+---------+---------+
     |          arrayField|  currentTime|meanError|someField|
     +--------------------+-------------+---------+---------+
     |[[14,20.238714,00...|1516211640000|   0.3082|someValue|
     +--------------------+-------------+---------+---------+
    

    【讨论】:

      猜你喜欢
      • 2021-06-14
      • 1970-01-01
      • 2016-02-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-12-10
      • 2021-12-07
      • 1970-01-01
      相关资源
      最近更新 更多