在句子中找到词的形态句法类别的任务称为词性(或 PoS)标记。
在您的情况下,您可能还需要先标记您的文本。
为此,您可以使用nltk、spacy 或Stanford NLP tagger(以及其他工具)。
请注意,根据您使用的模型,名词(单数名词、复数名词、专有名词)和动词(取决于时态和人称)可以有多个标签。
NLTK 示例:
import nltk
description = "This description describes apps with words."
tokenized_description = nltk.word_tokenize(description)
tagged_description = nltk.pos_tag(tokenized_description)
#tagged_description:
# [('This', 'DT'), ('description', 'NN'), ('describes', 'VBZ'), ('apps', 'RP'), ('with', 'IN'), ('words', 'NNS'), ('.', '.')]
# map the tags to a smaller set of tags
universal_tags_description = [(word, nltk.map_tag("wsj", "universal", tag)) for word, tag in tagged_description]
# universal_tags_description:
# [('This', 'DET'), ('description', 'NOUN'), ('describes', 'VERB'), ('apps', 'PRT'), ('with', 'ADP'), ('words', 'NOUN'), ('.', '.')]
filtered = [(word, tag) for word, tag in universal_tags_description if tag in {'NOUN', 'VERB'}]
# filtered:
# [('description', 'NOUN'), ('describes', 'VERB'), ('words', 'NOUN')]