【问题标题】:How do I clean data using pandas?如何使用 pandas 清理数据?
【发布时间】:2021-08-14 07:28:51
【问题描述】:

我必须' \\n, *, ' ==> '\n *' 但我尝试使用 df['Course_content']=df['Course_content'].replace(' \\n, *, ','\n *',regex=True) 但它不适合我

>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \\n, *,  The nature and purpose of cost and management accounting, \\n, *,  Source documents and coding, \\n, *,  Cost classification and measuring, \\n, *,  Recording costs, \\n, *,  Spreadsheets'
>>> df['Course_content']=df['Course_content'].replace(' \\n, *,  ','\n *',regex=True)
>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \\n, *,  The nature and purpose of cost and management accounting, \\n, *,  Source documents and coding, \\n, *,  Cost classification and measuring, \\n, *,  Recording costs, \\n, *,  Spreadsheets'
>>>

我也尝试使用以下代码,但它也不适合我

d = {
'Not Mentioned':'',
"\r\n": "\n",
"\\r": "\n",
'\u00a0':' ',
' \\n, *,':  "\n * ",
' \\n,':'\n',
}
df=df.replace(d.keys(),d.values(),regex=True)

【问题讨论】:

    标签: python regex pandas dataframe data-cleaning


    【解决方案1】:

    您可以将这两个参数放入 r-string 并在第一个参数的* 之前添加一个\。这是必要的,因为 \* 是正则表达式中的特殊元字符,您必须使用额外的 \ 和/或 r-string 将这些字符“转义”为它们的字面值。

    你可以使用:

    df['Course_content'] = df['Course_content'].replace(r' \\n, \*,  ', r'\n *', regex=True) 
    

    演示:

    data = {'Course_content': ['The syllabus for this course will cover the following:, \\n, *,  The nature and purpose of cost and management accounting, \\n, *,  Source documents and coding, \\n, *,  Cost classification and measuring, \\n, *,  Recording costs, \\n, *,  Spreadsheets']}
    df = pd.DataFrame(data)
    
    df['Course_content'] = df['Course_content'].replace(r' \\n, \*,  ', r'\n *', regex=True) 
    

    结果:

    print(df['Course_content'][0])
    
    
    'The syllabus for this course will cover the following:,\n *The nature and purpose of cost and management accounting,\n *Source documents and coding,\n *Cost classification and measuring,\n *Recording costs,\n *Spreadsheets'
    

    【讨论】:

      猜你喜欢
      • 2021-07-01
      • 2017-07-03
      • 1970-01-01
      • 2022-01-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-12
      • 2021-04-18
      相关资源
      最近更新 更多