【问题标题】:Two lists, one of words, one of phrases两个列表,一个单词,一个短语
【发布时间】:2012-07-08 05:45:52
【问题描述】:

好的,我有两个列表,一个是单词,如下所示:

["happy", "sad", "angry", "jumpy"]

然后是短语列表,如下所示:

["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]

我想使用第一个单词列表来查找短语列表中单词的出现次数。我不在乎是拉出实际的单词,用空格隔开,还是只用它们出现的次数。

从我的调查来看,re 模块和过滤器似乎是要走的路?

另外,如果我对我需要什么的解释不清楚,请告诉我。

【问题讨论】:

  • 这比你的其他问题要清楚一点。
  • 您不需要重新或过滤器。内置的运算符'in' 和 str.'count' 将有效地完成这项工作(按该顺序)。 katrielalex 和 poke 下面的解决方案演示了这两种方法。当然,re 也可以为您完成这项工作,但会使用刀可以做的佳能:-)
  • @GeneralBecos:他可能需要一个正则表达式来将每个短语分成单词。如果不是 "an" in "I might be American" 将返回 true,即使单词“an”不在该短语中。
  • 是的,我正在确定是否需要正则表达式。是的,这是一个更好的解释。我意识到我的问题不够清楚。每当我试图简化问题时,我都会发现存在误解,所以我认为我应该直截了当地说出我需要什么。

标签: python regex filter


【解决方案1】:
>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
>>> words = ["happy", "sad", "angry", "jumpy"]
>>> words_in_phrases = [re.findall(r"\b[\w']+\b", phrase.lower()) for phrase in phrases]
>>> words_in_phrases
[["i'm", 'so', 'happy', 'with', 'myself', 'lately'], ['johnny', 'im', 'so', 'sad', 'so', 'very', 'sad', 'call', 'me'], ['i', 'feel', 'like', 'crap', 'so', 'angry']]
>>> word_counts = [{word: phrase.count(word) for word in words} for phrase in words_in_phrases]
>>> word_counts
[{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}, {'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}, {'jumpy': 0, 'angry': 1, 'sad': 0, 'happy': 0}]
>>> 

对于word_counts = [{word: phrase.count(word) for word in words} for... 行,您需要 Python 2.7+。如果出于某种原因,您使用的是

>>> word_counts = [dict((word, phrase.count(word)) for word in words) for phrase in words_in_phrases]

【讨论】:

    【解决方案2】:
    >>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
    >>> words = ["happy", "sad", "angry", "jumpy"]
    >>> 
    >>> for phrase in phrases:
    ...     print phrase
    ...     print {word: phrase.count(word) for word in words}
    ... 
    I'm so happy with myself lately!
    {'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}
    Johnny, im so sad, so very sad, call me
    {'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}
    i feel like crap. SO ANGRY!!!!
    {'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 0}
    

    【讨论】:

      【解决方案3】:

      非常简单直接的解决方案:

      >>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
      >>> words = ["happy", "sad", "angry", "jumpy"]
      >>> for phrase in phrases:
              for word in words:
                  if word in phrase:
                      print('"{0}" is in the phrase "{1}".'.format(word, phrase))
      
      "happy" is in the phrase "I'm so happy with myself lately!".
      "sad" is in the phrase "Johnny, im so sad, so very sad, call me".
      

      【讨论】:

      • 是的,但我想计算实例。所以在约翰尼状态下,我需要记录不止一个。另外,我也可以为此插入一个正则表达式,对吗?
      • 您可以轻松地将打印更改为您想要对匹配执行的任何操作。根据您的问题,您“不在乎”如何处理它们,因此下次您应该更具体。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-08
      • 1970-01-01
      • 2021-09-07
      • 2021-04-28
      • 1970-01-01
      • 2013-07-16
      相关资源
      最近更新 更多