从字符串中提取单词到列表中 | Python答案

【问题标题】：Extract words, from string, into list | Python从字符串中提取单词到列表中 | Python
【发布时间】：2021-05-09 16:57:44
【问题描述】：

我已经阅读了很多帖子，但没有运气。

到目前为止，我已经尝试过.split() 和regex。

注意：我在 repl.it/ 上运行此代码。

import math

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
  ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents:
  for s in doc:
     words = s.split()
     print(words)
  all_words.append(words)
all_words = sorted(all_words)  # alphabeticalise
all_words = list(dict.fromkeys(all_words))  # remove duplicates

print('All Words')
print(all_words)
print()


print('Binary Scoring')
for doc in documents:
  scoring = []
  for word in all_words:
    if word in doc:
      scoring.append(1)
    else:
      scoring.append(0)
  print("\"" + doc + "\" = " + scoring)
print()

错误：

['It', 'is', 'going', 'to', 'rain', 'today']
['Today', 'I', 'am', 'not', 'going', 'outside']
['I', 'am', 'going', 'to', 'watch', 'the', 'season', 'premiere']
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    import BagofWords
  File "/home/runner/DeepLearning/BagofWords.py", line 21, in <module>
    all_words = list(dict.fromkeys(all_words))  # remove duplicates
TypeError: unhashable type: 'list'

【问题讨论】：

你没有一个字符串列表，你有一个列表列表。这有什么原因吗？子列表中可能包含多个字符串吗？
请不要在问题中编辑解决方案公告。接受（即单击旁边的“勾选”）现有答案之一，如果有的话。如果现有答案尚未涵盖您的解决方案，您还可以创建自己的答案，甚至接受它。

标签： python regex string list split

【解决方案1】：

拆分似乎工作得很好

   for doc in documents:
       words=doc[0].split(' ')
       print(words)

你把整个代码写错了

这是正确的代码

import re
import math

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
 ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents: 
  words=doc[0].split(' ')
  print(words)
  all_words.append(words)

 print('All Words')
 print(all_words)
print()


print('Binary Scoring')
for doc in documents:
scoring = 0
for word in all_words[0]:
    if word in doc[0]:
        scoring = scoring + 1
    else:
        scoring = scoring

print("\"" + doc[0] + "\" = " + str(scoring))

【讨论】：

检查更新的代码。如果您也需要列表中的其他项目，代码适用于列表中的第一项，只需循环遍历它

【解决方案2】：

完整的工作代码：

import math
import itertools

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
  ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents:
  for s in doc:
     words = s.split()
     print(words)
     all_words.append(words)
all_words = list(itertools.chain.from_iterable(all_words))
all_words = sorted(all_words)  # alphabeticalise
all_words = list(dict.fromkeys(all_words))

print(all_words, "\n")

print('Binary Scoring')
for doc in documents:
  scoring = []
  for word in all_words:
    if word in doc[0]:
      scoring.append(1)
    else:
      scoring.append(0)
  print("\"" + doc + "\" = " + scoring)

请参阅我的其他答案中的解释。

【讨论】：

我仍然收到错误:(。我已经粘贴了上面的新错误输出。可能与我的环境有关？
@iNeedHelp 现在应该可以了，再试一次
对不起，这也不起作用。确保您也像我一样在全新的环境中运行它。

【解决方案3】：

您有一个字符串列表，因此您必须遍历内部列表才能获取字符串（我假设内部列表可以是任意长度）。

for doc in documents:
  for s in doc:
     words = s.split()
     print(words)

将从每个文档中提取单词并打印出来。

输出：

['It', 'is', 'going', 'to', 'rain', 'today']
['Today', 'I', 'am', 'not', 'going', 'outside']
['I', 'am', 'going', 'to', 'watch', 'the', 'season', 'premiere']

【讨论】：

这仍然会引发错误。我在上面附加了我的整个代码。
使用嵌套列表，以便我可以一次轻松地将句子复制和粘贴到其中
@iNeedHelp remove any re 因为你不需要它，我测试了上面的代码，它可以工作。