【发布时间】:2020-07-29 04:36:19
【问题描述】:
我对此数据有一个用例:
1. "apple+case"
2. "apple+case+10+cover"
3. "apple+case+10++cover"
4. "+apple"
5. "iphone8+"
目前,我这样做是为了将 + 替换为空格,如下所示:
def normalizer(value: String): String = {
if (value == null) {
null
} else {
value.replaceAll("\\+", BLANK_SPACE)
}
}
val testUDF = udf(normalizer(_: String): String)
df.withColumn("newCol", testUDF($"value"))
但这是替换所有“+”。如何替换字符串之间的“+”,同时处理以下用例:“apple+case+10++cover”=>“apple case 10+ cover”?
The output should be
1. "apple case"
2. "apple case 10 cover"
3. "apple case 10+ cover"
4. "apple"
5. "iphone8+"
【问题讨论】:
标签: regex scala apache-spark regex-lookarounds regexp-replace