【发布时间】:2020-12-29 01:16:25
【问题描述】:
我的输入 spark-dataframe 命名为 df,
+---------------+----------------+-----------------------+
|Main_CustomerID|126+ Concentrate|2.5 Ethylhexyl_Acrylate|
+---------------+----------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+----------------+-----------------------+
我需要从df 的列名中删除特殊字符,如下所示,
删除
+将空格替换为
underscore- 将
dot替换为underscore
所以我的df 应该是这样的
+---------------+---------------+-----------------------+
|Main_CustomerID|126_Concentrate|2_5_Ethylhexyl_Acrylate|
+---------------+---------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+---------------+-----------------------+
使用 Scala,我实现了这一点,
var tableWithColumnsRenamed = df
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\.", "_"))
}
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\+", ""))
}
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll(" ", "_"))
}
df = tableWithColumnsRenamed
当我使用时,
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\.", "_"))
.withColumnRenamed(field, field.replaceAll("\\+", ""))
.withColumnRenamed(field, field.replaceAll(" ", "_"))
}
我得到的第一列名称是126 Concentrate,而不是126_Concentrate
但我不喜欢用 3 个 for 循环来替换。我能得到解决方案吗?
【问题讨论】:
标签: scala replace apache-spark-sql