【发布时间】:2014-06-25 02:26:43
【问题描述】:
如果我以homepage为例:
The strongest rain ever recorded in India shut down
the financial hub of Mumbai, snapped communication
lines, closed airports and forced thousands of people
to sleep in their offices or walk home during the night,
officials said today.
斯坦福解析器:
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");
treePrint.printTree(parse);
提供以下树:
(ROOT
(S
(S
(NP
(NP (DT The) (JJS strongest) (NN rain))
(VP
(ADVP (RB ever))
(VBN recorded)
(PP (IN in)
(NP (NNP India)))))
(VP
(VP (VBD shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ financial) (NN hub))
(PP (IN of)
(NP (NNP Mumbai)))))
(, ,)
(VP (VBD snapped)
(NP (NN communication) (NNS lines)))
(, ,)
(VP (VBD closed)
(NP (NNS airports)))
(CC and)
(VP (VBD forced)
(NP
(NP (NNS thousands))
(PP (IN of)
(NP (NNS people))))
(S
(VP (TO to)
(VP
(VP (VB sleep)
(PP (IN in)
(NP (PRP$ their) (NNS offices))))
(CC or)
(VP (VB walk)
(NP (NN home))
(PP (IN during)
(NP (DT the) (NN night))))))))))
(, ,)
(NP (NNS officials))
(VP (VBD said)
(NP-TMP (NN today)))
(. .)))
我现在想根据其结构拆分树以获得子句。 所以在这个例子中,我想拆分树以获得以下部分:
- 印度有史以来最强降雨
- 最强降雨关闭了孟买的金融中心
- 最强雨断通讯线路
- 最强雨停机场
- 最强降雨迫使数千人睡在办公室
- 最强降雨迫使数千人在夜间步行回家
我该怎么做?
所以第一个答案是使用递归算法打印所有 根到叶路径。
这是我尝试过的代码:
public static void main(String[] args) throws IOException {
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");
printAllRootToLeafPaths(tree, new ArrayList<String>());
}
private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
if(tree != null) {
if(tree.isLeaf()) {
path.add(tree.nodeString());
}
if(tree.children().length == 0) {
System.out.println(path);
} else {
for(Tree child : tree.children()) {
printAllRootToLeafPaths(child, path);
}
}
path.remove(tree.nodeString());
}
}
当然,这段代码完全不合逻辑,因为如果我只是添加叶子 到路径永远不会有递归调用,因为叶子有 没有小孩。 这里的问题是,所有真实的单词都是叶子,所以这个算法将 只需打印出叶子的单个单词:
[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]
【问题讨论】:
标签: java stanford-nlp