【问题标题】:Pandas remove characters between brackets [duplicate]熊猫删除括号之间的字符[重复]
【发布时间】:2019-11-14 04:07:48
【问题描述】:

我想删除 [] 之间的字符,目前我正在做

df['Text'] = df['Text'].str.replace(r"\[.*\]","")

但输出并不理想。之前是[image] This document,之后是******* This document,其中* 是空格。

我如何摆脱这个空白。

编辑 1

dfText 列如下所示:

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     [image: image0.jpg] Jack[image: image1.jb2] ...
9     [image: image0.jpg] ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    [image: image0.tif] Deep ML LEASE SERVI...
22    [image: image0.jpg] F 15 083 EX [image: image1...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    [image: image0.jpg] 17. Medical VERIFICATION...
31    [image: image0.jpg]  [image: image1.jb2] PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    [image: image0.tif] Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    [image: image0.jpg] Jack Dowson Buy Real MI...
46     Deep – Machine Learning LEASE   B...

我想看看

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     Jack ...
9     ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    Deep ML LEASE SERVI...
22    F 15 083 EX ...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    17. Medical VERIFICATION...
31    PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    Jack Dowson Buy Real MI...
46    Deep – Machine Learning LEASE   B...

【问题讨论】:

标签: python regex pandas


【解决方案1】:

看来你需要.str.strip()

例如:

df = pd.DataFrame({"ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"]})
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)

输出:

0                        This document
1               Readers of the article
2    The agreement between two parties
Name: Text, dtype: object

【讨论】:

  • 请注意,单词 betweentwo 之间有 two 个空格,所以这个命题不起作用. str.strip()整个文本中删除前导和尾随空格,而不是在每次匹配之前/之后。
  • @Valdi_Bo。谢谢 没看到。
【解决方案2】:

在您的正则表达式中添加可选空格 (?),因此整个正则表达式(匹配部分)应该是:

r'\[.*\] ?'

另一个提示:您的正则表达式括在括号中(捕获组)。 它们不是必需的。删除它们。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-02-08
    • 1970-01-01
    • 2020-07-04
    • 1970-01-01
    • 1970-01-01
    • 2017-12-03
    • 2021-11-01
    • 1970-01-01
    相关资源
    最近更新 更多