SpaCy - 将神经核函数结果拆分成句子答案

【问题标题】：SpaCy - Split the neuralcoref results into sentencesSpaCy - 将神经核函数结果拆分成句子
【发布时间】：2021-02-04 05:36:57
【问题描述】：

我正在使用neuralcoref 库进行共指解析。它在 Spacy 库之上工作。我让它按照文档工作。

import spacy
import neuralcoref

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)
doc1 = nlp('My sister has a dog. She loves him.')
print(doc1._.coref_resolved) # My sister has a dog. My sister loves a dog.

我要做的是将 coref_resolved 属性拆分成句子并删除标点符号。如果不通过另一个 NLP 注释运行它，我不知道该怎么做，就像这样：

 doc1 = nlp('My sister has a dog. She loves him.')
 doc2 = nlp(doc1._.coref_resolved)
 print(list(doc2.sents)) # [My sister has a dog., My sister loves a dog.]

接下来，我还需要删除标点符号。我如何在不通过两个不同的 NLP 管道运行单个句子的情况下做到这一点？我希望我的结果看起来像：

["My sister has a dog", "My sister loves a dog"]

谢谢！

【问题讨论】：

你知道如何安装neuralcoref吗？我在Win10上试过，但没有用。
我通过运行以下命令使其工作： git clone github.com/huggingface/neuralcoref.git cd neuralcoref pip install -r requirements.txt pip install -e 。 pip install spacy==2.3.2 python -m spacy 下载 en
我只能在Linux中安装neuralcoref。所以，1）doc1._.coref_resolved 是一个 Unicode 字符串，2）使用第二次调用 nlp 来运行句子拆分确实更容易，而不是尝试深入研究提及的标记/跨度结构。删除标点符号是一个已知问题，请参阅Removing punctuation using spaCy; AttribueError

标签： python nlp spacy

【解决方案1】：

doc1._.coref_resolved 属于 str 类型，因此您可能希望将其处理为您想要的输出：

import string

sents = doc1._.coref_resolved.split(". ")
sents_wo_punct = []
for sent in sents:
    sents_wo_punct.append(sent.translate(str.maketrans("", "", string.punctuation)))
print(sents_wo_punct)

['My sister has a dog', 'My sister loves a dog']

【讨论】：