【问题标题】:Clean Data - how to remove slash(/) between two words and the Bracket () [duplicate]清洁数据 - 如何删除两个单词和括号()之间的斜杠(/)[重复]
【发布时间】:2020-03-06 16:04:22
【问题描述】:

我对编程和 Python 还是很陌生。我有一个字符串列表:

['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

如何清除两个单词之间的所有斜线以及包含在任何单词/单词中的括号。干净的数据是:

['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']

【问题讨论】:

  • 欢迎堆栈溢出!根据您自己的研究,了解到目前为止您感到疲倦会有所帮助; re.substr.replace 等?请注意,我们在 stackoverflow 上要求 minimal reproducible example

标签: python regex python-3.x string data-cleaning


【解决方案1】:

你可以试试这个。

\w+ 匹配任何单词字符(等于[a-zA-Z0-9_]

lst=['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']

new=re.findall('\w+',' '.join(lst))

输出:

['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']

不使用re。您可以使用str.strip()str.split()

[i.strip('()') for s in lst for i in s.split('/')]

【讨论】:

    【解决方案2】:

    让我为你的名单命名:

    a = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
     'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']
    

    首先,用斜线分隔所有元素,你可以这样做

    c = [j for elem in  for j in elem.split("/") ]
    
    And now all in one,
    

    c = [j for elem in a for j in re.sub(r'[()]', "", elem).split("/") ]

    
    

    其次,假设您要从列表中的每个元素中删除一组字符,例如['(',')']

    为此,您可以构建一个正则表达式:

    d = [re.sub(r'[(\)]', "", elem) for elem in c]
    

    结果是

    ['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn', 'and', 'Operation', 
    'Enduring', 'Freedom', 'Afghanistan,', 'have', 'other', 'than', 'call', 'publications']
    

    【讨论】:

      【解决方案3】:

      请看看这个。

      data_list = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
       'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']
      
      out_put_list = []
      for data in data_list:
          if '/' in data:
              out_put_list.extend(data.split("/"))
          else:
              out_put_list.append(data.replace('(', '').replace(')', ''))
      
      print(out_put_list)
      

      【讨论】:

        【解决方案4】:

        使用列表推导:

        a = ['Iraqi', 'Freedom/Operation', 'New', 'Dawn', 'and', 'Operation', 'Enduring',
         'Freedom', '(Afghanistan),', 'have', '(other', 'than', 'call', 'publications)']
        
        
        b = [ i.split('/') for i in a]
        b = [ i for row in b for i in row]
        b = [ i.strip().strip(',').strip('(').strip(')') for i in b]
        
        print(b)
        ['Iraqi', 'Freedom', 'Operation', 'New', 'Dawn',
         'and', 'Operation', 'Enduring', 'Freedom',
         'Afghanistan', 'have', 'other', 'than',
         'call', 'publications']
        

        【讨论】:

          猜你喜欢
          • 2019-11-04
          • 2018-07-26
          • 1970-01-01
          • 1970-01-01
          • 2020-04-08
          • 2011-10-20
          • 1970-01-01
          • 1970-01-01
          • 2016-04-16
          相关资源
          最近更新 更多