【问题标题】:Spark - Can't create schema for array of structsSpark - 无法为结构数组创建架构
【发布时间】:2020-04-20 04:58:54
【问题描述】:

我正在尝试为包含结构数组的数据框创建一个相当简单的架构,但我根本无法让它工作。我已经在 SO 上阅读了几个类似的问题,但仍然无法正常工作。我已经经历了几次迭代。这是我目前的尝试:

val theSchema = StructType (
    StructField("dateTime",StringType,true),  
    StructField("sys",StringType,true),
    StructField("attribs",ArrayType(StructType(StructField("attribName",StringType,true), StructField("attribValue",StringType,true)),true),true)
)

失败并出现此错误:

<console>:29: error: overloaded method value apply with alternatives:
  (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
 cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField)
           StructField("attribs",ArrayType(StructType(StructField("attribName",StringType,true), StructField("attribValue",StringType,true)),true),true)
                                           ^

我做错了什么?

【问题讨论】:

    标签: scala apache-spark


    【解决方案1】:

    如果你看StructType的签名:

    StructType(fields: Array[StructField]) extends DataType with Seq[StructField] with Product with Serializable
    

    它需要StructFields 的集合;正如 API 文档中所述,可以将 StructType 对象构造为 StructType(fields: Seq[StructField])

    import org.apache.spark.sql.types._
    
    val theSchema = StructType(Seq(
      StructField("dateTime", StringType, true),  
      StructField("sys", StringType, true),
      StructField("attribs", ArrayType(StructType(Seq(
        StructField("attribName", StringType, true),
        StructField("attribValue", StringType, true)
      )), true), true)
    ))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-11-25
      • 1970-01-01
      • 2015-12-20
      相关资源
      最近更新 更多