【问题标题】:How to do named entity recognition (NER) using quanteda?如何使用 quanteda 进行命名实体识别 (NER)?
【发布时间】:2019-07-31 11:25:09
【问题描述】:

拥有一个带有文本的数据框

df = data.frame(id=c(1,2), text = c("My best friend John works and Google", "However he would like to work at Amazon as he likes to use python and stay at Canada")

无需任何预处理

如何提取名称实体识别,如this

示例结果词

dfresults = data.frame(id=c(1,2), ner_words = c("John, Google", "Amazon, python, Canada")

【问题讨论】:

    标签: r quanteda


    【解决方案1】:

    您可以在不使用 quanteda 的情况下使用 spacyr 包(链接文章中提到的 spaCy 库的包装器)来执行此操作。

    在这里,我稍微编辑了您的输入 data.frame。

    df <- data.frame(id = c(1, 2), 
                     text = c("My best friend John works at Google.", 
                              "However he would like to work at Amazon as he likes to use Python and stay in Canada."),
                     stringsAsFactors = FALSE)
    

    然后:

    library("spacyr")
    library("dplyr")
    
    # -- need to do these before the next function will work:
    # spacy_install()
    # spacy_download_langmodel(model = "en_core_web_lg")
    
    spacy_initialize(model = "en_core_web_lg")
    #> Found 'spacy_condaenv'. spacyr will use this environment
    #> successfully initialized (spaCy Version: 2.0.10, language model: en_core_web_lg)
    #> (python options: type = "condaenv", value = "spacy_condaenv")
    
    txt <- df$text
    names(txt) <- df$id
    
    spacy_parse(txt, lemma = FALSE, entity = TRUE) %>%
        entity_extract() %>%
        group_by(doc_id) %>%
        summarize(ner_words = paste(entity, collapse = ", "))
    #> # A tibble: 2 x 2
    #>   doc_id ner_words             
    #>   <chr>  <chr>                 
    #> 1 1      John, Google          
    #> 2 2      Amazon, Python, Canada
    

    【讨论】:

    • 如果我收到这样的错误Finding a python executable with spaCy installed... Error in set_spacy_python_option(python_executable, virtualenv, condaenv, : spaCy or language model en is not installed in any of python executables. 我该如何解决?
    • 请参阅spacyr.quanteda.io 的 spacyr 安装说明。您似乎没有正确安装 spacyr
    猜你喜欢
    • 2017-06-19
    • 2012-04-20
    • 2020-11-04
    • 1970-01-01
    • 2017-11-13
    • 1970-01-01
    • 2013-07-06
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多