【问题标题】:Python - remove stopwords from a stringPython - 从字符串中删除停用词
【发布时间】:2013-12-16 13:46:18
【问题描述】:

我在创建从字符串输入中删除停用词的代码时遇到问题。目前,这是我的代码:

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
                 "of", "from", "here", "even", "the", "but", "and", "is", "my", \
                 "them", "then", "this", "that", "than", "though", "so", "are" ]
stemEndings = [ "-s", "-es", "-ed", "-er", "-ly" "-ing", "-'s", "-s'" ]
punctuation = [ ".", ",", ":", ";", "!", "?" ]
line = raw_input ("Type in lines, finish with a . at start of line only:")
while line != ".":
    def remove_punctuation(input): #removes punctuation from input
        output = ""
        text= 0
        while text<=(len(input)-1) :
            if input[text] not in punctuation:
               output=output + input[text]
            text+=1
        return output
    newline= remove_punctuation(line)
    newline= newline.lower()

可以添加什么代码来根据上面的 stopWords 列表从字符串中删除 stopWords?提前谢谢你。

【问题讨论】:

    标签: python string


    【解决方案1】:

    正如 greg 建议的那样,您应该使用 for 循环而不是 while,因为它更符合 Python 风格且易于理解代码。另外,你应该在while循环输入之前之前声明你的函数,这样python解释器就不会每次都重新定义函数!

    此外,如果需要,您可以将标点符号设置为 string 而不是 list(为了可读性和易用性)

    stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
                  "of", "from", "here", "even", "the", "but", "and", "is", "my", \
                  "them", "then", "this", "that", "than", "though", "so", "are" ]
    stemEndings = [ "-s", "-es", "-ed", "-er", "-ly" "-ing", "-'s", "-s'" ]
    punctuation = ".,:;!?"
    
    def remove_punctuation(input_string):
        for item in punctuation:
            input_string = input_string.replace(item, '')
        return input_string
    
    line = raw_input ("Type in lines, finish with a . at start of line only:")
    
    while not line == ".":
        newline = remove_punctuation(line)
        newline = newline.lower()
    

    【讨论】:

      【解决方案2】:

      我不明白你的问题,你想从输入字符串中删除标点符号。我的变种remove_punctuation函数:

      def remove_punctuation(input_string):
          for item in punctuation:
              input_string = input_string.replace(item, '')
          return input_string
      

      【讨论】:

      • 您不应将变量命名为“输入”。
      • 同意,已修复。谢谢!
      【解决方案3】:

      我在另一篇文章中发现了一些有趣的东西,可以大大提高您的代码性能。 尝试使用下面链接中提到的设置。 Faster way to remove stop words in Python

      感谢alko

      【讨论】:

      • 欢迎来到 Stack Overflow!请引用链接中最相关的部分,以防目标站点无法访问或永久离线。见How do I write a good answer
      【解决方案4】:

      您可以使用NTLK 库来代替定义停止词。

      pip install nltk
      

      【讨论】:

      • 以一些代码为例会有所帮助
      猜你喜欢
      • 1970-01-01
      • 2019-12-18
      • 2014-05-22
      • 1970-01-01
      • 2014-06-06
      • 2015-02-25
      • 1970-01-01
      • 2016-10-06
      相关资源
      最近更新 更多