从 n 个不同的描述中生成一组名词和动词，列出匹配一个名词和动词的描述答案

【问题标题】：Generate set of Nouns and verbs from n different descriptions, list out descriptions that match a noun and verb从 n 个不同的描述中生成一组名词和动词，列出匹配一个名词和动词的描述
【发布时间】：2018-10-16 18:18:36
【问题描述】：

我是 NLP 新手，我有列应用程序名称及其描述。数据看起来像这样
app1，description1（app1的一些信息，它是如何工作的）
应用程序2，描述2
.
.
应用程序（n），描述（n）
从这些描述中，我需要生成一组有限的名词和动词。在最终的应用程序中，当我们将此列表中的名词和动词配对时，输出应该是满足该名词+动词的应用程序列表。我不知道从哪里开始，请您指导我从哪里开始。谢谢。

【问题讨论】：

标签： tensorflow machine-learning nlp deep-learning nltk

【解决方案1】：

在句子中找到词的形态句法类别的任务称为词性（或 PoS）标记。在您的情况下，您可能还需要先标记您的文本。

为此，您可以使用nltk、spacy 或Stanford NLP tagger（以及其他工具）。

请注意，根据您使用的模型，名词（单数名词、复数名词、专有名词）和动词（取决于时态和人称）可以有多个标签。

NLTK 示例：

import nltk

description = "This description describes apps with words."
tokenized_description = nltk.word_tokenize(description)
tagged_description = nltk.pos_tag(tokenized_description)
#tagged_description:
# [('This', 'DT'), ('description', 'NN'), ('describes', 'VBZ'), ('apps', 'RP'), ('with', 'IN'), ('words', 'NNS'), ('.', '.')]

# map the tags to a smaller set of tags
universal_tags_description = [(word, nltk.map_tag("wsj", "universal", tag)) for word, tag in tagged_description]  
# universal_tags_description:
# [('This', 'DET'), ('description', 'NOUN'), ('describes', 'VERB'), ('apps', 'PRT'), ('with', 'ADP'), ('words', 'NOUN'), ('.', '.')]
filtered = [(word, tag) for word, tag in universal_tags_description if tag in {'NOUN', 'VERB'}]
# filtered: 
# [('description', 'NOUN'), ('describes', 'VERB'), ('words', 'NOUN')]

【讨论】：