【发布时间】:2016-05-10 06:24:50
【问题描述】:
我正在使用斯坦福 CoreNLP(01.2016 版),我想在依赖关系中保留标点符号。当你从命令行运行它时,我找到了一些方法,但我没有找到任何关于提取依赖关系的 java 代码。
这是我当前的代码。它有效,但不包含标点符号:
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.put("parse.model", modelPath );
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
"-maxLength", "200", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree parse = lp.apply(words);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> td = gs.typedDependencies();
parsedText += td.toString() + "\n";
任何类型的依赖关系对我来说都是可以的,基本的、键入的、折叠的等。 我只想包含标点符号。
提前致谢,
【问题讨论】:
标签: java nlp stanford-nlp dependency-parsing