【发布时间】:2017-06-22 20:24:17
【问题描述】:
我有一个 PENN 语法树,我想递归地获取该树包含的所有规则。
(ROOT
(S
(NP (NN Carnac) (DT the) (NN Magnificent))
(VP (VBD gave) (NP ((DT a) (NN talk))))
)
)
我的目标是获得如下语法规则:
ROOT --> S
S --> NP VP
NP --> NN
...
正如我所说,我需要递归地执行此操作,并且不使用 NLTK 包或任何其他模块或正则表达式。这是我到目前为止所拥有的。参数tree 是在每个空间上拆分的 Penn-Tree。
def extract_rules(tree):
tree = tree[1:-1]
print("\n\n")
if len(tree) == 0:
return
root_node = tree[0]
print("Current Root: "+root_node)
remaining_tree = tree[1:]
right_side = []
temp_tree = list(remaining_tree)
print("remaining_tree: ", remaining_tree)
symbol = remaining_tree.pop(0)
print("Symbol: "+symbol)
if symbol not in ["(", ")"]:
print("CASE: No Brackets")
print("Rule: "+root_node+" --> "+str(symbol))
right_side.append(symbol)
elif symbol == "(":
print("CASE: Opening Bracket")
print("Temp Tree: ", temp_tree)
cursubtree_end = bracket_depth(temp_tree)
print("Subtree ends at position "+str(cursubtree_end)+" and Element is "+temp_tree[cursubtree_end])
cursubtree_start = temp_tree.index(symbol)
cursubtree = temp_tree[cursubtree_start:cursubtree_end+1]
print("Subtree: ", cursubtree)
rnode = extract_rules(cursubtree)
if rnode:
right_side.append(rnode)
print("Rule: "+root_node+" --> "+str(rnode))
print(right_side)
return root_node
def bracket_depth(tree):
counter = 0
position = 0
subtree = []
for i, char in enumerate(tree):
if char == "(":
counter = counter + 1
if char == ")":
counter = counter - 1
if counter == 0 and i != 0:
counter = i
position = i
break
subtree = tree[0:position+1]
return position
目前它适用于S 的第一个子树,但所有其他子树都不会被递归解析。很高兴有任何帮助..
【问题讨论】:
标签: python python-3.x recursion nlp