如何将数据框的所有列转换为字符串答案

【问题标题】：how to cast all columns of dataframe to string如何将数据框的所有列转换为字符串
【发布时间】：2026-01-18 01:25:01
【问题描述】：

我有一个混合类型的数据框。我正在使用从蜂巢表中读取此数据框 spark.sql('select a,b,c from table') 命令。

有些列是 int 、 bigint 、 double ，有些是字符串。共有 32 列。 pyspark 中是否有任何方法可以将数据框中的所有列转换为字符串类型？

【问题讨论】：

标签： apache-spark pyspark apache-spark-sql

【解决方案1】：

只是：

from pyspark.sql.functions import col

table = spark.sql("table")

table.select([col(c).cast("string") for c in table.columns])

【讨论】：

在处理 2.1.0 版本的千列时，此方法比 withCcolumns 具有性能优势

【解决方案2】：

这是 Scala 中的单行解决方案：

df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

让我们在这里看一个例子：

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val data = Seq(
   Row(1, "a"),
   Row(5, "z")
)

val schema = StructType(
  List(
    StructField("num", IntegerType, true),
    StructField("letter", StringType, true)
 )
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  schema
)

df.printSchema
//root
//|-- num: integer (nullable = true)
//|-- letter: string (nullable = true)

val newDf = df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

newDf.printSchema
//root
//|-- num: string (nullable = true)
//|-- letter: string (nullable = true)

希望对你有帮助

【讨论】：

【解决方案3】：

对于 Scala，火花版本 > 2.0

case class Row(id: Int, value: Double)

import spark.implicits._

import org.apache.spark.sql.functions._

val r1 = Seq(Row(1, 1.0), Row(2, 2.0), Row(3, 3.0)).toDF()

r1.show
+---+-----+
| id|value|
+---+-----+
|  1|  1.0|
|  2|  2.0|
|  3|  3.0|
+---+-----+

val castedDF = r1.columns.foldLeft(r1)((current, c) => current.withColumn(c, col(c).cast("String")))

castedDF.printSchema
root
 |-- id: string (nullable = false)
 |-- value: string (nullable = false)

【讨论】：

【解决方案4】：

for col in df_data.columns:
     df_data = df_data.withColumn(col, df_data[col].cast(StringType()))

【讨论】：

请不要只发布代码作为答案，还要解释您的代码的作用以及它如何解决问题的问题。带有解释的答案通常更有帮助、质量更好，并且更有可能吸引投票。

【解决方案5】：

你可以像这样投射单列

import pyspark.sql.functions as F
import pyspark.sql.types as T
df = df.withColumn("id", F.col("new_id").cast(T.StringType()))

只为所有列投射

【讨论】：