如何使用正则表达式替换替换特殊字符？答案

【问题标题】：how to use regex replace to replace special character?如何使用正则表达式替换替换特殊字符？
【发布时间】：2020-03-24 06:45:50
【问题描述】：

我正在尝试使用 regex replace 将“\”替换为 \，但没有得到正确的解决方案。想要删除即将出现的双引号。你能帮我怎么做吗？

例子：

"\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

结果：

\"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

【问题讨论】：

您的问题已经解决了吗？

标签： regex dataframe pyspark regular-language

【解决方案1】：

这能解决您的问题吗？

re.sub(r'"\\"', r'\\', text)

【讨论】：

嗨，亚历克斯，感谢您的回复。我仍然面临同样的问题。我用这个 - df = df.withColumn('QSTN', regexp_replace(col('QSTN'), '"\\"', '\\'))
用于保存我正在使用的数据帧 - df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, escape= '\"', sep='|',header='True',nullValue=None)

【解决方案2】：

尝试以下解决方案：

df = spark.createDataFrame([
    (1, '"\\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"')
], ("ID","textVal"))

import pandas as pd
from  pyspark.sql.functions import regexp_replace, col
pd.set_option('max_colwidth', 200)

df2 = df.withColumn('textVal', regexp_replace(col('textVal'), '\\"\\\\\"', '\\\\')) 
df2.toPandas()


ID  textVal
0   1   \"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

希望对你有帮助！

【讨论】：