【发布时间】:2019-10-23 06:33:16
【问题描述】:
我正在使用 spaCy 的句子器来拆分句子。
from spacy.lang.en import English
nlp = English()
sbd = nlp.create_pipe('sentencizer')
nlp.add_pipe(sbd)
text="Please read the analysis. (You'll be amazed.)"
doc = nlp(text)
sents_list = []
for sent in doc.sents:
sents_list.append(sent.text)
print(sents_list)
print([token.text for token in doc])
输出
['Please read the analysis. (',
"You'll be amazed.)"]
['Please', 'read', 'the', 'analysis', '.', '(', 'You', "'ll", 'be',
'amazed', '.', ')']
标记化已正确完成,但我不确定它是否将第二句与 ( 并以此作为第一句的结尾。
【问题讨论】:
标签: python-3.x nlp spacy