清洁数据 - 如何删除两个单词和括号（）之间的斜杠（/）[重复]答案

【问题标题】：Clean Data - how to remove slash(/) between two words and the Bracket () [duplicate]清洁数据 - 如何删除两个单词和括号（）之间的斜杠（/）[重复]
【发布时间】：2020-03-06 16:04:22
【问题描述】：

我对编程和 Python 还是很陌生。我有一个字符串列表：

['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

如何清除两个单词之间的所有斜线以及包含在任何单词/单词中的括号。干净的数据是：

['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']

【问题讨论】：

欢迎堆栈溢出！根据您自己的研究，了解到目前为止您感到疲倦会有所帮助； re.sub、str.replace 等？请注意，我们在 stackoverflow 上要求 minimal reproducible example

标签： python regex python-3.x string data-cleaning

【解决方案1】：

你可以试试这个。

\w+ 匹配任何单词字符（等于[a-zA-Z0-9_]）

lst=['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

new=re.findall('\w+',' '.join(lst))

输出：

['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']

不使用re。您可以使用str.strip() 和str.split()。

[i.strip('()') for s in lst for i in s.split('/')]

【讨论】：

【解决方案2】：

让我为你的名单命名：

a = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

首先，用斜线分隔所有元素，你可以这样做

c = [j for elem in  for j in elem.split("/") ]

And now all in one,

c = [j for elem in a for j in re.sub(r'[()]', "", elem).split("/") ]

其次，假设您要从列表中的每个元素中删除一组字符，例如['(',')']

为此，您可以构建一个正则表达式：

d = [re.sub(r'[(\)]', "", elem) for elem in c]

结果是

['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 
'Enduring', 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']

【讨论】：

【解决方案3】：

请看看这个。

data_list = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

out_put_list = []
for data in data_list:
    if '/' in data:
        out_put_list.extend(data.split("/"))
    else:
        out_put_list.append(data.replace('(', '').replace(')', ''))

print(out_put_list)

【讨论】：

【解决方案4】：

使用列表推导：

a = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']


b = [ i.split('/') for i in a]
b = [ i for row in b for i in row]
b = [ i.strip().strip(',').strip('(').strip(')') for i in b]

print(b)
['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn',
 'and', 'Operation', 'Enduring', 'Freedom',
 'Afghanistan', 'have', 'other', 'than',
 'call', 'publications']

【讨论】：