【问题标题】:Can i create a dataframe from another dataframes rows我可以从另一个数据框行创建一个数据框吗
【发布时间】:2021-01-22 15:43:42
【问题描述】:
我可以使用 Pyspark 从下面的行创建一个数据框,作为新数据框的列吗?
+------------+
| col|
+------------|
|created_meta|
| updated_at|
|updated_meta|
| meta|
| Year|
| First Name|
| County|
| Sex|
| Count|
+------------
【问题讨论】:
标签:
apache-spark
pyspark
apache-spark-sql
【解决方案1】:
两种方式。
- 使用枢轴:
df1 = df.groupBy().pivot('col').agg(F.lit(None)).limit(0)
df1.show()
+-----+------+---------+---+----+------------+----+----------+------------+
|Count|County|FirstName|Sex|Year|created_meta|meta|updated_at|updated_meta|
+-----+------+---------+---+----+------------+----+----------+------------+
+-----+------+---------+---+----+------------+----+----------+------------+
- 从头开始创建:
df2 = df.select([F.lit(r[0]) for r in df.collect()]).limit(0)
df2.show()
+------------+----------+------------+----+----+---------+------+---+-----+
|created_meta|updated_at|updated_meta|meta|Year|FirstName|County|Sex|Count|
+------------+----------+------------+----+----+---------+------+---+-----+
+------------+----------+------------+----+----+---------+------+---+-----+
【解决方案2】:
// sorry in Scala + Spark
import spark.implicits._
import org.apache.spark.sql.functions._
val lst = List("created_meta",
"updated_at",
"updated_meta",
"meta",
"Year",
"First Name",
"County",
"Sex",
"Count")
val source = lst.toDF("col")
source.show(false)
// +------------+
// |col |
// +------------+
// |created_meta|
// |updated_at |
// |updated_meta|
// |meta |
// |Year |
// |First Name |
// |County |
// |Sex |
// |Count |
// +------------+
val l = source.select('col).as[String].collect.toList
val df1 = l.foldLeft(source)((acc, col) => {
acc.withColumn(col, lit(""))
})
val df2 = df1.drop("col")
df2.printSchema()
// root
// |-- created_meta: string (nullable = false)
// |-- updated_at: string (nullable = false)
// |-- updated_meta: string (nullable = false)
// |-- meta: string (nullable = false)
// |-- Year: string (nullable = false)
// |-- First Name: string (nullable = false)
// |-- County: string (nullable = false)
// |-- Sex: string (nullable = false)
// |-- Count: string (nullable = false)
df2.show(1, false)
// +------------+----------+------------+----+----+----------+------+---+-----+
// |created_meta|updated_at|updated_meta|meta|Year|First Name|County|Sex|Count|
// +------------+----------+------------+----+----+----------+------+---+-----+
// | | | | | | | | | |
// +------------+----------+------------+----+----+----------+------+---+-----+