【问题标题】:How to get full list of POS, Tag, and Dep in spaCy?如何在 spaCy 中获取 POS、Tag 和 Dep 的完整列表?
【发布时间】:2020-02-01 13:54:28
【问题描述】:

spaCy 中使用的 POS、TAG 和 DEP 值是 NLP 常见的值,但我相信根据语料库的不同会有一些差异。

例如,Universal Dependencies Contributors 列出了 37 个语法依赖项。 spaCy 是否使用所有这 37 个依赖项?仅此而已?

spaCy 可能提供输出所有 POS、TAG 和 DEP 值的命令吗?

【问题讨论】:

    标签: nlp spacy


    【解决方案1】:

    特定的管道组件可以显示其标签:

    nlp = spacy.load('en')
    nlp.get_pipe("tagger").labels
    # ('$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``')
    

    spacy.explain(label) 将为大多数标签提供一个简短的定义,例如:

    spacy.explain("NN")
    # 'noun, singular or mass'
    spacy.explain("PERSON")
    # 'People, including fictional'
    

    详细文档在这里:https://spacy.io/api/annotation

    【讨论】:

      【解决方案2】:

      这是一个老问题,但也许有人觉得我的回答很有帮助。我不知道是否可以输出所有的POS,但是在这里可以很容易地找到它们:Part-of-Speech tagging

      获取 DEP 列表:

      nlp = spacy.load("en_core_web_sm")
      for label in nlp.get_pipe("parser").labels:
          print(label, " -- ", spacy.explain(label))
      

      获取TAG列表:

      nlp = spacy.load("en_core_web_sm")
      for label in nlp.get_pipe("tagger").labels:
          print(label, " -- ", spacy.explain(label))
      

      spaCy提供的管道在这里:Pipelines

      【讨论】:

        【解决方案3】:

        【讨论】:

        • 欢迎提供指向解决方案的链接,但请确保您的答案在没有它的情况下有用:在链接周围添加上下文 (meta.stackexchange.com/a/8259),以便您的其他用户了解您要说的内容提供此链接。
        【解决方案4】:
        TAG_LIST = [".",",","-LRB-","-RRB-","``","\"\"","''",",","$","#","AFX","CC","CD","DT","EX","FW","HYPH","IN","JJ","JJR","JJS","LS","MD","NIL","NN","NNP","NNPS","NNS","PDT","POS","PRP","PRP$","RB","RBR","RBS","RP","SP","SYM","TO","UH","VB","VBD","VBG","VBN","VBP","VBZ","WDT","WP","WP$","WRB","ADD","NFP","GW","XX","BES","HVS","_SP"]
        POS_LIST = ["ADJ", "ADP", "ADV", "AUX", "CONJ", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X", "SPACE"]
        DEP_LIST = ["acl", "acomp", "advcl", "advmod", "agent", "amod", "appos", "attr", "aux", "auxpass", "case", "cc", "ccomp", "compound", "conj", "cop", "csubj", "csubjpass", "dative", "dep", "det", "dobj", "expl", "intj", "mark", "meta", "neg", "nn", "npmod", "nsubj", "nsubjpass", "oprd", "obj", "obl", "pcomp", "pobj", "poss", "preconj", "prep", "prt", "punct",  "quantmod", "relcl", "root", "xcomp"]
        NER_LIST = ["PERSON", "NORP", "FAC", "ORG", "GPE", "LOC", "PRODUCT", "EVENT", "WORK_OF_ART", "LAW", "LANGUAGE", "DATE", "TIME", "PERCENT", "MONEY", "QUANTITY", "ORDINAL", "CARDINAL"]
        

        【讨论】:

        • 您的代码似乎只显示了答案,但没有提供问题“如何”部分的答案。
        猜你喜欢
        • 2019-08-26
        • 2022-08-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多