首先导航一棵树,参见How to iterate through all nodes of a tree? 和How to navigate a nltk.tree.Tree?:
>>> from nltk.tree import Tree
>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
>>> ptree = Tree.fromstring(bracket_parse)
>>> ptree
Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
>>> for subtree in ptree.subtrees():
... print subtree
...
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
(VP (VB get) (NP (PRP me)) (ADVP (RB now)))
(VB get)
(NP (PRP me))
(PRP me)
(ADVP (RB now))
(RB now)
而你要找的是https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:
>>> ptree.productions()
[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']
注意Tree.productions() 返回一个Production 对象,请参阅https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 和https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236。
如果你想要一个字符串形式的语法规则,你可以这样做:
>>> for rule in ptree.productions():
... print rule
...
S -> VP
VP -> VB NP ADVP
VB -> 'get'
NP -> PRP
PRP -> 'me'
ADVP -> RB
RB -> 'now'
或者
>>> rules = [str(p) for p in ptree.productions()]
>>> rules
['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]