【问题标题】:ScispaCy in google colab谷歌 colab 中的 ScispaCy
【发布时间】:2020-05-31 04:33:28
【问题描述】:

我正在尝试在 colab 中使用 ScispaCy 构建临床数据的 NER 模型。我已经安装了这样的包。

!pip install spacy
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz       #pip install <Model URL>```

然后我都使用导入了

import scispacy
import spacy
import en_core_sci_md

然后使用以下代码显示句子和实体

nlp = spacy.load("en_core_sci_md")
text ="""Myeloid derived suppressor cells (MDSC) are immature myeloid cells with immunosuppressive activity. They accumulate in tumor-bearing mice and humans with different types of cancer, including hepatocellular carcinoma (HCC)""" 
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)

我收到以下错误

OSError: [E050] Can't find model 'en_core_sci_md'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

我不知道为什么会出现这个错误,我遵循了 ScispaCy 官方 GitHub 帖子中的所有代码。任何帮助,将不胜感激。 提前致谢。

【问题讨论】:

    标签: python nlp spacy named-entity-recognition


    【解决方案1】:

    希望我还不算太晚...我相信您已经非常接近正确的方法了。

    我会分步写下我的答案,你可以选择在哪里停下来。

    步骤 1)

    #Install en_core_sci_lg package from the website of spacy  (large corpus), but you can also use en_core_sci_md for the medium corpus.
           
    !pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_lg-0.2.4.tar.gz 
    

    步骤 2)

    # Import the large dataset
    import en_core_sci_lg
    

    步骤 3)

    # Identify entities
    nlp = en_core_sci_lg.load()
    doc = nlp(text)
    displacy_image = displacy.render(doc, jupyter = True, style = "ent")
    

    第 4 步)

    #Print only the entities
    print(doc.ents)
    

    第 5 步)

    # Save the result 
    save_res = [doc.ents]
    save_res
    

    步骤 6)

    #Save the results to a dataframe
    df_save_res = pd.DataFrame(save_res)
    df_save_res
    

    第 7 步)

    # In case that you want to visualise the dependency parse
      displacy_image = displacy.render(doc, jupyter = True, style = "dep")
    

    【讨论】:

      猜你喜欢
      • 2021-05-14
      • 1970-01-01
      • 2023-02-17
      • 2021-09-11
      • 2018-11-08
      • 2019-07-26
      • 2020-07-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多