如何在 spaCy 中获取名词短语的标记？

【问题标题】：How to get tokens for noune phrases in spaCy?如何在 spaCy 中获取名词短语的标记？
【发布时间】：2019-09-25 08:41:47
【问题描述】：

我用

得到标记和名词短语

text = ("This is commonly referred to as global warming or climate change.")
doc = nlp(text)

for token in doc:
    print(token.i, token.text)

print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

结果是

0 This
1 is
2 commonly
3 referred
4 to
5 as
6 global
7 warming
8 or
9 climate
10 change
11 .
Noun phrases: ['global warming', 'climate change']

是否可以获得名词短语而不是单词的标记索引？例如

Noun phrases: ['6,7', '9,10']

【问题讨论】：

标签： python spacy

【解决方案1】：

您可以使用Span's start and end 属性：

start   int     The index of the first token of the span.
end     int     The index of the first token after the span.

所以，使用

print("Noun phrases:", [(chunk.start,chunk.end-1) for chunk in doc.noun_chunks])
# => Noun phrases: [(6, 7), (9, 10)]

或者，如果您需要逗号分隔的字符串项，

 ["{},{}".format(chunk.start,chunk.end-1) for chunk in doc.noun_chunks]
 ## => Noun phrases: ['6,7', '9,10']

【讨论】：