使用 SpaCy 和 python lambda 提取命名实体答案

【问题标题】：Extract Named Entities using SpaCy and python lambda使用 SpaCy 和 python lambda 提取命名实体
【发布时间】：2021-01-01 05:21:55
【问题描述】：

我正在使用following 代码使用 lambda 提取命名实体。

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

和

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

对于几百条记录，它可以提取结果。但是当涉及到数千条记录时。这需要很长时间。有人可以帮我优化这行代码吗？

【问题讨论】：

标签： python nlp spacy named-entity-extraction

【解决方案1】：

您可以通过以下方式改进：

在整个文档列表中调用nlp.pipe
禁用不必要的管道。

试试：

import spacy
nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])

df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})

texts = df["Text"].to_list()
ents = []
for doc in nlp.pipe(texts):
    for ent in doc.ents:
        if ent.label_ == "GPE":
            ents.append(ent)
            
print(ents)

[Germany]

【讨论】：