【问题标题】:Better way to remove multiple words from a string?从字符串中删除多个单词的更好方法?
【发布时间】:2015-09-25 05:46:22
【问题描述】:
bannedWord = ["Good", "Bad", "Ugly"]
    
def RemoveBannedWords(toPrint, database):
    statement = toPrint
    for x in range(0, len(database)):
        if bannedWord[x] in statement:
            statement = statement.replace(bannedWord[x] + " ", "")
    return statement
        
toPrint = "Hello Ugly Guy, Good To See You."
    
print(RemoveBannedWords(toPrint, bannedWord))

输出是Hello Guy, To See You. Knowing Python 我觉得有更好的方法来实现更改字符串中的几个单词。我使用字典搜索了一些类似的解决方案,但似乎不适合这种情况。

【问题讨论】:

    标签: python regex string python-3.x replace


    【解决方案1】:

    我用

    bannedWord = ['Good','Bad','Ugly']
    toPrint = 'Hello Ugly Guy, Good To See You.'
    print(' '.join(i for i in toPrint.split() if i not in bannedWord))
    

    【讨论】:

      【解决方案2】:

      这是一个正则表达式的解决方案:

      import re
          
      def RemoveBannedWords(toPrint,database):
          statement = toPrint
          pattern = re.compile("\\b(Good|Bad|Ugly)\\W", re.I)
          return pattern.sub("", toPrint)
          
      toPrint = "Hello Ugly Guy, Good To See You."
          
      print(RemoveBannedWords(toPrint,bannedWord))
      

      【讨论】:

        【解决方案3】:

        Ajay 的代码略有变化,其中一个字符串是被禁词列表中另一个字符串的子字符串

        bannedWord = ['good', 'bad', 'good guy' 'ugly']
        

        toPrint ='good winter good guy' 的结果是

        RemoveBannedWords(toPrint,database = bannedWord) = 'winter good'
        

        因为它将首先删除good。需要对列表中元素的长度进行排序。

        import re
        
        def RemoveBannedWords(toPrint,database):
            statement = toPrint
            database_1 = sorted(list(database), key=len)
            pattern = re.compile(r"\b(" + "|".join(database_1) + ")\\W", re.I)
            return pattern.sub("", toPrint + ' ')[:-1] #added because it skipped last word
        
        toPrint = 'good winter good guy.'
        
        print(RemoveBannedWords(toPrint,bannedWord))
        

        【讨论】:

          【解决方案4】:

          主题的另一种变体。如果您要经常调用它,那么最好编译一次正则表达式以提高速度:

          import re
          
          bannedWord = ['Good', 'Bad', 'Ugly']
          re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)
          
          def RemoveBannedWords(toPrint):
              global re_banned_words
              return re_banned_words.sub("", toPrint)
          
          toPrint = 'Hello Ugly Guy, Good To See You.'
          print(RemoveBannedWords(toPrint))
          

          【讨论】:

          • 最佳答案,奇怪的是票数这么少。如果您需要查找嵌入的单词,请在\\W 中添加星号“*”:re.compile(r"\b(" + "|".join(list_not_for_search) + ")\\W*", re.I)。就像在“你好丑陋的家伙,很高兴见到你”中一样。这将排除“丑陋”并给出“yy”作为其余部分。顺便说一句:re.I 代表 re.IGNORECASE。
          【解决方案5】:

          当您检查开头的单词边界和结尾的非单词字符时,最好使用正则表达式。 也可以使用内存中的数组/列表

          bannedWord = ['Good', 'Bad', 'Ugly']
          
          toPrint = 'Hello Uglyyy Guy, Good To See You.'
          
          for word in bannedWord:
              toPrint = toPrint.replace(word, "")
          
          print(toPrint) 
          
          Hello yy Guy,  To See You.
          
          [Program finished] 
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2012-04-08
            • 2017-07-17
            • 2014-11-21
            • 2013-09-23
            • 2014-05-22
            相关资源
            最近更新 更多