【问题标题】:How to get tokens for noune phrases in spaCy?如何在 spaCy 中获取名词短语的标记?
【发布时间】:2019-09-25 08:41:47
【问题描述】:

我用

得到标记和名词短语
text = ("This is commonly referred to as global warming or climate change.")
doc = nlp(text)

for token in doc:
    print(token.i, token.text)

print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

结果是

0 This
1 is
2 commonly
3 referred
4 to
5 as
6 global
7 warming
8 or
9 climate
10 change
11 .
Noun phrases: ['global warming', 'climate change']

是否可以获得名词短语而不是单词的标记索引?例如

Noun phrases: ['6,7', '9,10']

【问题讨论】:

    标签: python spacy


    【解决方案1】:

    您可以使用Span's start and end 属性:

    start   int     The index of the first token of the span.
    end     int     The index of the first token after the span.
    

    所以,使用

    print("Noun phrases:", [(chunk.start,chunk.end-1) for chunk in doc.noun_chunks])
    # => Noun phrases: [(6, 7), (9, 10)]
    

    或者,如果您需要逗号分隔的字符串项,

     ["{},{}".format(chunk.start,chunk.end-1) for chunk in doc.noun_chunks]
     ## => Noun phrases: ['6,7', '9,10']
    

    【讨论】:

      猜你喜欢
      • 2020-04-07
      • 1970-01-01
      • 2017-11-23
      • 1970-01-01
      • 2017-03-09
      • 1970-01-01
      • 1970-01-01
      • 2023-01-28
      • 1970-01-01
      相关资源
      最近更新 更多