【问题标题】:How to keep punctuation in Stanford dependency parser如何在斯坦福依赖解析器中保留标点符号
【发布时间】:2016-05-10 06:24:50
【问题描述】:

我正在使用斯坦福 CoreNLP(01.2016 版),我想在依赖关系中保留标点符号。当你从命令行运行它时,我找到了一些方法,但我没有找到任何关于提取依赖关系的 java 代码。

这是我当前的代码。它有效,但不包含标点符号:

Annotation document = new Annotation(text);

        Properties props = new Properties();

        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");

        props.setProperty("ssplit.newlineIsSentenceBreak", "always");

        props.setProperty("ssplit.eolonly", "true");

        props.setProperty("pos.model", modelPath1);

        props.put("parse.model", modelPath );

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        pipeline.annotate(document);

        LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,

                "-maxLength", "200", "-retainTmpSubcategories");

        TreebankLanguagePack tlp = new PennTreebankLanguagePack();

        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();

        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        for (CoreMap sentence : sentences) {

            List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);               

            Tree parse = lp.apply(words);

            GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
            Collection<TypedDependency> td = gs.typedDependencies();

            parsedText += td.toString() + "\n";

任何类型的依赖关系对我来说都是可以的,基本的、键入的、折叠的等。 我只想包含标点符号。

提前致谢,

【问题讨论】:

    标签: java nlp stanford-nlp dependency-parsing


    【解决方案1】:

    您在这里做了很多额外的工作,因为您通过 CoreNLP 运行解析器一次,然后再次调用 lp.apply(words)

    获取带有标点符号的依赖树/图的最简单方法是使用 CoreNLP 选项parse.keepPunct,如下所示。

    Annotation document = new Annotation(text);
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
    props.setProperty("ssplit.newlineIsSentenceBreak", "always");
    props.setProperty("ssplit.eolonly", "true");
    props.setProperty("pos.model", modelPath1);
    props.setProperty("parse.model", modelPath);
    props.setProperty("parse.keepPunct", "true");
    
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    
    pipeline.annotate(document);
    
    for (CoreMap sentence : sentences) {
       //Pick whichever representation you want
       SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
       SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
       SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
    }
    

    句子注释对象将依赖树/图存储为SemanticGraph。如果您想要TypedDependency 对象的列表,请使用方法typedDependencies()。例如,

    List<TypedDependency> dependencies = basicDeps.typedDependencies();
    

    【讨论】:

    • 最后一个true 必须是"true" 因为setProperty 只接受String, String
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多