将数据框的架构与其他数据框的架构进行比较答案

【问题标题】：Compare schema of dataframe with schema of other dataframe将数据框的架构与其他数据框的架构进行比较
【发布时间】：2021-09-21 09:57:32
【问题描述】：

我有两个从 hdfs 路径读取的数据集的架构，它定义如下：

val df = spark.read.parquet("/path")

df.printSchema()

root
 |-- name: string (nullable = true)
 |-- id: integer (nullable = true)
 |-- dept: integer (nullable = true)

【问题讨论】：

标签： java scala dataframe apache-spark apache-spark-sql

【解决方案1】：

因为您的架构文件看起来像 CSV：

// Read and convert into a MAP  
val csvSchemaDf = spark.read.csv("/testschemafile")
val schemaMap = csvSchema.rdd.map(x => (x(0).toString.trim,x(1).toString.trim)).collectAsMap

var isSchemaMatching = true

//Iterate through the schema fields of your df and compare 
for( field <- df.schema.toList ){
  if( !(schemaMap.contains(field.name) && 
        field.dataType.toString.equals(schemaMap.get(field.name).get))){
      //Mismatch 
      isSchemaMatching = false;
  }
}

使用isSchemaMatching 进行进一步的逻辑

【讨论】：

【解决方案2】：

您可以通过以下方式创建 StructType 的实例：


    val schema = StructType(
        Seq(
            StructField("name", StringType(), true),
            StructField("id", IntegerType(), true)
        ))

只需读取文件并根据文件中的数据创建架构。

Spark schema examples
Scaladoc of spark types
Spark type doc

【讨论】：