【问题标题】:Replacing special character "." with "-" in dataframe column name in scala替换特殊字符“.”在scala中的数据框列名中带有“-”
【发布时间】:2020-12-15 07:33:19
【问题描述】:

我想将 "BMU 1 Cell 1 Temp. (C)" 替换为 "BMU_1_Cell_1_Temp_C" 并将列类型转换为加倍。

尝试了以下选项:

  1. 在单引号中提供列名
val df= df1.withColumn("`BMU 1 Cell 1 Temp. (C)`",col("`BMU 1 Cell 1 Temp. (C)`").cast("Double")).withColumnRenamed("`BMU 1 Cell 1 Temp. (C)`","BMU_1_Cell_1_Temp_C")
val df= df1.withColumn("BMU 1 Cell 1 Temp. (C)",col("BMU 1 Cell 1 Temp. (C)").cast("Double")).withColumnRenamed("BMU 1 Cell 1 Temp. (C)","BMU_1_Cell_1_Temp_C").replaceAll("\\.","_"))

得到以下错误:

org.apache.spark.sql.AnalysisException: cannot resolve '`BMU 1 PCB Temp. (C)`'

你能帮我解决这个问题吗?

【问题讨论】:

    标签: scala dataframe apache-spark databricks


    【解决方案1】:

    这是我的火花 3.0.0。测试。

    val df = spark.createDataFrame(Seq(("1", "123.456"))).toDF("id", "BMU 1 Cell 1 Temp. (C)")
    
    df.withColumnRenamed("BMU 1 Cell 1 Temp. (C)", "BMU_1_Cell_1_Temp_C")
      .withColumn("BMU_1_Cell_1_Temp_C", $"BMU_1_Cell_1_Temp_C".cast("double")).show
    
    +---+-------------------+
    | id|BMU_1_Cell_1_Temp_C|
    +---+-------------------+
    |  1|            123.456|
    +---+-------------------+
    

    这两个都行。

    val df = spark.createDataFrame(Seq(("1", "123.456"))).toDF("id", "BMU 1 Cell 1 Temp. (C)")
    
    val cols = df.columns.map(c => c.replaceAll("([.] )|[ ]", "_").replaceAll("[()]", ""))
    
    df.toDF(cols: _*).withColumn("BMU_1_Cell_1_Temp_C", $"BMU_1_Cell_1_Temp_C".cast("double")).show()
    
    +---+-------------------+
    | id|BMU_1_Cell_1_Temp_C|
    +---+-------------------+
    |  1|            123.456|
    +---+-------------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-05-04
      • 2019-02-19
      • 1970-01-01
      • 1970-01-01
      • 2022-07-01
      • 2022-09-23
      • 2018-11-23
      • 2018-06-04
      相关资源
      最近更新 更多