【问题标题】:Extract words, from string, into list | Python从字符串中提取单词到列表中 | Python
【发布时间】:2021-05-09 16:57:44
【问题描述】:

我已经阅读了很多帖子,但没有运气。

到目前为止,我已经尝试过.split()regex

注意:我在 repl.it/ 上运行此代码。

import math

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
  ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents:
  for s in doc:
     words = s.split()
     print(words)
  all_words.append(words)
all_words = sorted(all_words)  # alphabeticalise
all_words = list(dict.fromkeys(all_words))  # remove duplicates

print('All Words')
print(all_words)
print()


print('Binary Scoring')
for doc in documents:
  scoring = []
  for word in all_words:
    if word in doc:
      scoring.append(1)
    else:
      scoring.append(0)
  print("\"" + doc + "\" = " + scoring)
print()

错误:

['It', 'is', 'going', 'to', 'rain', 'today']
['Today', 'I', 'am', 'not', 'going', 'outside']
['I', 'am', 'going', 'to', 'watch', 'the', 'season', 'premiere']
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    import BagofWords
  File "/home/runner/DeepLearning/BagofWords.py", line 21, in <module>
    all_words = list(dict.fromkeys(all_words))  # remove duplicates
TypeError: unhashable type: 'list'

【问题讨论】:

  • 你没有一个字符串列表,你有一个列表列表。这有什么原因吗?子列表中可能包含多个字符串吗?
  • 请不要在问题中编辑解决方案公告。接受(即单击旁边的“勾选”)现有答案之一,如果有的话。如果现有答案尚未涵盖您的解决方案,您还可以创建自己的答案,甚至接受它。

标签: python regex string list split


【解决方案1】:

拆分似乎工作得很好

   for doc in documents:
       words=doc[0].split(' ')
       print(words)

你把整个代码写错了

这是正确的代码

import re
import math

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
 ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents: 
  words=doc[0].split(' ')
  print(words)
  all_words.append(words)

 print('All Words')
 print(all_words)
print()


print('Binary Scoring')
for doc in documents:
scoring = 0
for word in all_words[0]:
    if word in doc[0]:
        scoring = scoring + 1
    else:
        scoring = scoring

print("\"" + doc[0] + "\" = " + str(scoring))

【讨论】:

  • 检查更新的代码。如果您也需要列表中的其他项目,代码适用于列表中的第一项,只需循环遍历它
【解决方案2】:

完整的工作代码:

import math
import itertools

documents = [
  ["It is going to rain today"],
  ["Today I am not going outside"],
  ["I am going to watch the season premiere"]
]
docs = 1000
words_per_doc = 100  # length of doc

dp = 4

# -- Setup --
all_words = []  # all instances
for doc in documents:
  for s in doc:
     words = s.split()
     print(words)
     all_words.append(words)
all_words = list(itertools.chain.from_iterable(all_words))
all_words = sorted(all_words)  # alphabeticalise
all_words = list(dict.fromkeys(all_words))

print(all_words, "\n")

print('Binary Scoring')
for doc in documents:
  scoring = []
  for word in all_words:
    if word in doc[0]:
      scoring.append(1)
    else:
      scoring.append(0)
  print("\"" + doc + "\" = " + scoring)

请参阅我的其他答案中的解释。

【讨论】:

  • 我仍然收到错误:(。我已经粘贴了上面的新错误输出。可能与我的环境有关?
  • @iNeedHelp 现在应该可以了,再试一次
  • 对不起,这也不起作用。确保您也像我一样在全新的环境中运行它。
【解决方案3】:

您有一个字符串列表,因此您必须遍历内部列表才能获取字符串(我假设内部列表可以是任意长度)。

for doc in documents:
  for s in doc:
     words = s.split()
     print(words)

将从每个文档中提取单词并打印出来。

输出:

['It', 'is', 'going', 'to', 'rain', 'today']
['Today', 'I', 'am', 'not', 'going', 'outside']
['I', 'am', 'going', 'to', 'watch', 'the', 'season', 'premiere']

【讨论】:

  • 这仍然会引发错误。我在上面附加了我的整个代码。
  • 使用嵌套列表,以便我可以一次轻松地将句子复制和粘贴到其中
  • @iNeedHelp remove any re 因为你不需要它,我测试了上面的代码,它可以工作。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多