【问题标题】:Compare schema of dataframe with schema of other dataframe将数据框的架构与其他数据框的架构进行比较
【发布时间】:2021-09-21 09:57:32
【问题描述】:
我有两个从 hdfs 路径读取的数据集的架构,它定义如下:
val df = spark.read.parquet("/path")
df.printSchema()
root
|-- name: string (nullable = true)
|-- id: integer (nullable = true)
|-- dept: integer (nullable = true)
【问题讨论】:
标签:
java
scala
dataframe
apache-spark
apache-spark-sql
【解决方案1】:
因为您的架构文件看起来像 CSV:
// Read and convert into a MAP
val csvSchemaDf = spark.read.csv("/testschemafile")
val schemaMap = csvSchema.rdd.map(x => (x(0).toString.trim,x(1).toString.trim)).collectAsMap
var isSchemaMatching = true
//Iterate through the schema fields of your df and compare
for( field <- df.schema.toList ){
if( !(schemaMap.contains(field.name) &&
field.dataType.toString.equals(schemaMap.get(field.name).get))){
//Mismatch
isSchemaMatching = false;
}
}
使用isSchemaMatching 进行进一步的逻辑