【发布时间】:2015-05-30 00:34:45
【问题描述】:
我正在尝试在进一步处理输入之前从用户输入中删除某些短语和单词,并且在尝试执行此操作时遇到了“索引超出范围”错误的问题并且完全卡住了。我该如何解决这个问题?
我将输入短语作为字符串转换为列表以比较每个单词,并将停用词作为预定义列表。
示例输入:
[“好”、“你”、“知道”、“那个”、“天气”、“是”、“糟糕”]
["you", "know", "what", "i", "mean", "so", "just", "turn", "the", "lights", "on"]
#Gets user input and removes the selected stop words from it and returns a filtered phrase back.
def stop_word_remover(phrase_list):
stop_words_lst = ["yo", "so", "well", "um", "a", "the","you know", "i mean"]
#initalize clean phrase string
clean_input_phrase= ""
#copying phrase_list into a new variable for stopword removal.
Copy_phrase_list = list(phrase_list)
#Cleanup loop
for i in range(1,len(phrase_list)):
has_stop_words = False
for x in range(len(stop_words_lst)):
has_stop_words = False
#if one of the stop words matches the word passed by the first main loop the flag is raised.
if (phrase_list[i-1]+" "+phrase_list[i]) == stop_words_lst[x].strip():
has_stop_words = True
# this if statement adds the word of the phrase only if the flag is not raised thus making sure all the stop words are filtered out
if has_stop_words == True:
Copy_phrase_list.remove(Copy_phrase_list[i-1])
Copy_phrase_list.remove(Copy_phrase_list[i-1])
#first for loop takes a individual words of the phrase given and makes a loop until the whole phrase goes through one word at a time
for i in range(len(Copy_phrase_list)):
#flag initialized for marking stop words
has_stop_words = False
#second loop takes all the stop words and compares them to the first word passed on by the first loop to sheck for a stop word
for x in range(len(stop_words_lst)):
#if one of the stop words matches the word passed by the first main loop the flag is raised.
if Copy_phrase_list[i] == stop_words_lst[x].strip():
has_stop_words = True
# this if statement adds the word of the phrase only if the flag is not raised thus making sure all the stop words are filtered out
if has_stop_words == False:
clean_input_phrase += str(Copy_phrase_list[i]) +" "
return clean_input_phrase
【问题讨论】:
-
您的缩进错误。您能否更正它并提供函数输入示例和预期输出。
-
@Marcin 输入函数可以是任何类型的短语/命令。它只是为了在进一步分析之前从输入中删除这些。但我确实修复了缩进并添加了一些示例短语。
-
我试过你的code,我没有收到任何错误。它似乎对我有用。
-
它运行但返回错误的输出。而不是采取“你知道我的意思所以只是打开灯”,删除“你知道”,“我的意思”,“所以”和“the”并返回“打开灯的内容”。它返回“我的意思是打开灯”@Marcin 而且它似乎适用于某些人而不适用于其他人。像 ["you","know","lock","my","computer","yo","man","you","know"] 似乎没有运行。
标签: python python-2.7 nlp stop-words