摆脱嵌套列表python中的专有名词答案

【问题标题】：getting rid of proper nouns in a nested list python摆脱嵌套列表python中的专有名词
【发布时间】：2014-04-18 05:05:50
【问题描述】：

我正在尝试纠正一个程序，该程序接受一个嵌套列表，并返回一个取出专有名词的新列表。

这是一个例子：

L = [['The', 'name', 'is', 'James'], ['Where', 'is', 'the', 'treasure'], ['Bond', 'cackled', 'insanely']]

我想回来：

['the', 'name', 'is', 'is', 'the', 'tresure', 'cackled', 'insanely']

请注意，“where”已被删除。没关系，因为它没有出现在嵌套列表中的其他任何地方。每个嵌套列表都是一个句子。我的方法是将嵌套列表中的每个第一个元素附加到 newList。然后我比较看看 newList 中的元素是否在嵌套列表中。我会将 newList 中的元素小写以进行检查。我已经完成了这个程序的一半，但是当我尝试从最后的 newList 中删除元素时遇到了错误。一旦我得到新的更新列表，我想从 newList 中的nestedList 中删除项目。我最后将嵌套列表中的所有项目附加到 newerList 并将它们小写。应该这样做。

如果有人有更有效的方法，我很乐意倾听。

def lowerCaseFirst(L):
    newList = []
    for nestedList in L:
        newList.append(nestedList[0])
    print newList

    for firstWord in newList:
        sum = 0
        firstWord = firstWord.lower()
        for nestedList in L:
            for word in nestedList[1:]:
                if firstWord == word:
                    print "yes"

                    sum = sum + 1
            print newList
        if sum >= 1:
            firstWord = firstWord.upper()
            newList.remove(firstWord)
    return newList

请注意，由于倒数第二行的错误，此代码未完成

这里是更新列表（updatedNewList）：

def lowerCaseFirst(L):
    newList = []
    for nestedList in L:
        newList.append(nestedList[0])
    print newList
    updatedNewList = newList
    for firstWord in newList:
        sum = 0
        firstWord = firstWord.lower()
        for nestedList in L:
            for word in nestedList[1:]:
                if firstWord == word:
                    print "yes"

                    sum = sum + 1
            print newList
        if sum >= 1:
            firstWord = firstWord.upper()
            updatedNewList.remove(firstWord)
    return updatedNewList

错误信息：

Traceback (most recent call last):
  File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
    # Used internally for debug sandbox under external interpreter
  File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 80, in lowerCaseFirst
ValueError: list.remove(x): x not in list

【问题讨论】：

你没有提到错误实际上是什么，但你不能在迭代列表时更改它。为什么不将您想要的项目添加到新列表中，而不是尝试从旧列表中删除您不想要的项目？如果您需要更一般的反馈，请尝试codereview.stackexchange.com
第一个“The”是小写的吗？

标签： python

【解决方案1】：

第一个函数中的错误是因为您尝试从没有大写单词的新列表中删除 firstWord 的大写版本（您可以从打印输出中看到）。请记住，您将单词的大写/小写版本存储在新变量中，但不会更改原始列表的内容。

我还是不明白你的做法。当你描述你的任务时，你想对事情做； 1）将列表的 a 列表展平为元素列表（始终是一个有趣的编程练习）和 2）从该列表中删除专有名词。这意味着您必须决定什么是专有名词。您可以初步做到这一点（所有非开头的大写单词，或详尽的列表），或者您可以使用 POS 标记器（请参阅：Finding Proper Nouns using NLTK WordNet）。除非我完全误解了你的任务，否则你不必担心这里的外壳。

第一个任务可以通过多种方式解决。这是一个很好的方法，可以很好地说明在列表 L 是列表列表（而不是可以无限嵌套的列表）的简单情况下实际发生的情况：

def flatten(L):
  newList = []
  for sublist in L:
      for elm in sublist: 
          newList.append(elm)
  return newList

你可以通过像这样检查每个元素来将这个函数变成 flattenAndFilter(L)：

PN = ['詹姆斯'，'邦德']

def flattenAndFilter(L):
  newList = []
  for sublist in L:
      for elm in sublist: 
          if not elm in PN:
              newList.append(elm)
  return newList

不过，您可能没有这么好的 PN 列表，那么您将不得不扩展检查，例如解析句子并检查 POS 标签。

【讨论】：