【问题标题】:how to use regex replace to replace special character?如何使用正则表达式替换替换特殊字符?
【发布时间】:2020-03-24 06:45:50
【问题描述】:

我正在尝试使用 regex replace 将“\”替换为 \,但没有得到正确的解决方案。想要删除即将出现的双引号。你能帮我怎么做吗?

例子:

"\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

结果:

\"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

【问题讨论】:

  • 您的问题已经解决了吗?

标签: regex dataframe pyspark regular-language


【解决方案1】:

这能解决您的问题吗?

re.sub(r'"\\"', r'\\', text)

【讨论】:

  • 嗨,亚历克斯,感谢您的回复。我仍然面临同样的问题。我用这个 - df = df.withColumn('QSTN', regexp_replace(col('QSTN'), '"\\"', '\\'))
  • 用于保存我正在使用的数据帧 - df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, escape= '\"', sep='|',header='True',nullValue=None)
【解决方案2】:

尝试以下解决方案:

df = spark.createDataFrame([
    (1, '"\\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"')
], ("ID","textVal"))

import pandas as pd
from  pyspark.sql.functions import regexp_replace, col
pd.set_option('max_colwidth', 200)

df2 = df.withColumn('textVal', regexp_replace(col('textVal'), '\\"\\\\\"', '\\\\')) 
df2.toPandas()


ID  textVal
0   1   \"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

希望对你有帮助!

【讨论】:

    猜你喜欢
    • 2014-05-05
    • 1970-01-01
    • 1970-01-01
    • 2010-12-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-17
    相关资源
    最近更新 更多