【问题标题】:Using NLP or Spacy, How can we extract contextual data from a text given entity as the input?使用 NLP 或 Spacy,我们如何从给定实体的文本中提取上下文数据作为输入?
【发布时间】:2019-08-13 06:53:09
【问题描述】:

例如,给出了一个文本(以文档的形式)以及人名“John”。我们需要从文本中提取所有以他的名字或其他方式提到约翰的句子。

【问题讨论】:

  • 你已经尝试过什么?这似乎是一项非常简单的任务。
  • 感谢大卫的回复。我正在尝试实体提取,并在此基础上进行关系提取,但与从非结构化数据(如某个州的城市)进行简单提取不同,我正在寻找一种方法来提取整个句子或段落,其中存在实体并直接或间接提及。
  • 好的,你已经编码了什么?
  • 似乎除了entity extraction,您还需要执行dependency parsing 来获取引用您的实体但未明确提及的实例。到@DavidBatista 点,向我们展示您已经编码的内容和输入文本,以便能够提供帮助

标签: nlp nltk stanford-nlp spacy named-entity-recognition


【解决方案1】:

您是否使用 NLTK 来提取实体? 我在下面做了一个类似的,

import nltk
import re
from nltk.sem import extract_rels,rtuple
from nltk.chunk import tree2conlltags

sample = """"Michael Joseph Jackson was born in Gary, Indiana, near Chicago, on August 29, 1958.
He was the eighth of ten children in the Jackson family, a working-class African-American family living in a two-bedroom house on Jackson Street.
His mother, Katherine Esther Jackson (née Scruse), left the Baptist tradition in 1963 to become a devout Jehovah's Witness.She played clarinet and piano and had aspired to be a country-and-western performer; she worked part-time at Sears to support the family.
His father, Joseph Walter 'Joe' Jackson, a former boxer, was a steelworker at U.S. Steel.
Joe played guitar with a local rhythm and blues band, the Falcons, to supplement the family's income.
Despite being a convinced Lutheran, Joe followed his wife's faith, as did all their children.
His father's great-grandfather, July 'Jack' Gale, was a Native American medicine man and US Army scout.
Michael grew up with three sisters (Rebbie, La Toya, and Janet) and five brothers (Jackie, Tito, Jermaine, Marlon, and Randy).
A sixth brother, Marlon's twin Brandon, died shortly after birth."""

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]


for i, sent in enumerate(tagged_sentences):
    sent = nltk.ne_chunk(sent) 
    print(sent)

这将打印以下内容, (S / (人迈克尔/NNP约瑟夫/NNP杰克逊/NNP) 是/VBD 出生/VBN 输入/输入 (GPE 加里/NNP) ,/, (GPE 印第安纳州/NNP) ,/, 近/IN (GPE 芝加哥/NNP) ,/, 开/IN 八月/NNP 29张/CD ,/, 1958/CD

【讨论】:

    猜你喜欢
    • 2019-06-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-03-02
    • 1970-01-01
    • 2022-01-26
    • 2012-01-01
    • 2021-09-04
    相关资源
    最近更新 更多