Python：优化，或至少获得树生成器的新想法答案

【问题标题】：Python: Optimizing, or at least getting fresh ideas for a tree generatorPython：优化，或至少获得树生成器的新想法
【发布时间】：2010-01-25 17:07:29
【问题描述】：

我编写了一个程序，它生成随机表达式，然后使用遗传技术来选择适合度。

程序的以下部分生成随机表达式并将其存储在树结构中。

由于这在运行期间可能会被调用数十亿次，我认为它应该针对时间进行优化。

我是编程新手，我自己工作（玩耍），就像我在互联网上搜索一样

想法，我想要一些意见，因为我觉得我正在孤立地做这件事。

瓶颈似乎是 Node.init ()，（占总时间的 22%）和 random.choice()，（占总时间的 14%）

import random

def printTreeIndented(data, level=0):
    '''utility to view the tree
    '''
    if data == None:
         return
    printTreeIndented(data.right, level+1)
    print '  '*level + '  '+ str(data.cargo)#+ '  '+ str(data.seq)+ '  '+ str(data.branch)
    printTreeIndented(data.left, level+1)

#These are the global constants used in the Tree.build_nodes() method.
Depth = 5
Ratio = .6 #probability of terminating the current branch.
Atoms = ['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','x','x','x','x']
#dict of operators. the structure is: operator: number of arguements
Operators = {'+': 2, '-': 2, '*': 2, '/': 2, '**': 2}


class KeySeq:
    '''Iterator to produce sequential 
    integers for keys in Tree.thedict
    '''
    def __init__(self, data = 0):
        self.data = data
    def __iter__(self):
        return self
    def next(self):
        self.data = self.data + 1
        return self.data
KS = KeySeq()

class Node(object):
    '''
    '''
    def __init__(self, cargo, left=None, right=None):
        object.__init__(self)
        self.isRoot = False
        self.cargo = cargo
        self.left  = left
        self.right = right
        self.parent = None
        self.branch = None
        self.seq = 0


class Tree(object):
    def __init__(self):
        self.thedict = {}     #provides access to the nodes for further mutation and
        # crossbreeding.
        #When the Tree is instantiated, it comes filled with data.
        self.data = self.build_nodes()

# Uncomment the following lines to see the data and a crude graphic of the tree.
#        print 'data: '
#        for v in  self.thedict.itervalues():
#            print v.cargo,
#        print
#        print
#        printTreeIndented(self.data)

    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
     bbranch = None):
        '''
        '''
        r = float()
        r = random.random()

        #If r > Ratio, it forces a terminal node regardless of
        #the value of depth.
        #If entry = 1, then it's the root node and we don't want
        # a tree with just a value in the root node.

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            '''
            Add a terminal node.
            '''
            this_atom = (random.choice(Atoms))
            this_atom = str(this_atom)
            this_node = Node(this_atom)
            this_node.parent = pparent
            this_node.branch = bbranch
            this_node.seq = KS.next()
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            '''
            Add a node that has branches.
            '''
            this_operator = (random.choice(Operators.keys()))

            this_node = Node(this_operator)
            if entry:
                this_node.isRoot = True
            this_node.parent = pparent
            this_node.branch = bbranch
            this_node.seq = KS.next()
            self.thedict[this_node.seq] = this_node
            #branch as many times as 'number of arguements'
            # it's only set up for 2 arguements now.
            for i in range(Operators[this_operator]):
                depth =(depth - 1)
                if i == 0:
                    this_node.left = (self.build_nodes(entry = 0, depth =(depth),
                     pparent = this_node, bbranch = 'left'))
                else:
                    this_node.right = (self.build_nodes(entry = 0, depth =(depth),
                     pparent = this_node, bbranch = 'right'))
            return this_node


def Main():
    for i in range(100000):
        t = Tree()
    return t

if __name__ == '__main__':
    rresult = Main()

【问题讨论】：

标签： python optimization expression-trees

【解决方案1】：

下面，我总结了一些比较明显的优化工作，但并没有真正触及算法。所有计时都是在 Linux x86-64 系统上使用 Python 2.6.4 完成的。

初始时间：8.3s

低垂的果实

jellybean 已经指出了一些。只需修复这些就可以稍微改善运行时间。通过一次又一次地使用相同的列表来替换对Operators.keys() 的重复调用也可以节省一些时间。

时间：6.6s

使用 itertools.count

Dave Kirby 指出，只需使用itertools.count 也可以为您节省一些时间：

from itertools import count
KS = count()

时间：6.2s

改进构造函数

由于您没有在 ctor 中设置 Node 的所有属性，您只需将属性声明移动到类主体中：

class Node(object):
    isRoot = False
    left  = None
    right = None
    parent = None
    branch = None
    seq = 0

    def __init__(self, cargo):
        self.cargo = cargo

就您而言，这不会改变类的语义，因为类主体中使用的所有值都是不可变的（False、None、0），如果您需要其他值，先读this answer on class attributes。

时间：5.2s

使用命名元组

在您的代码中，您不再更改表达式树，因此您不妨使用不可变对象。 Node 也没有任何行为，因此使用 namedtuple 是一个不错的选择。不过，这确实有一个含义，因为现在必须删除 parent 成员。从您可能会引入具有两个以上参数的运算符这一事实来看，无论如何您都必须用子节点列表替换左/右节点，这又是可变的，并且允许在所有子节点之前创建父节点。

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "left", "right", "branch", "seq", "isRoot"])
# ...
    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, None, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator,
              self.build_nodes(entry = 0, depth = depth - 1,
                               pparent = None, bbranch = 'left'),
              self.build_nodes(entry = 0, depth = depth - 2,
                               pparent = None, bbranch = 'right'),
              bbranch, 
              KS.next(), 
              bool(entry))

            self.thedict[this_node.seq] = this_node    
            return this_node

我保留了操作数循环的原始行为，每次迭代都会递减深度。我不确定这是想要的行为，但更改它会增加运行时间，因此无法进行比较。

最终时间：4.1s

从这里去哪里

如果您希望支持两个以上的运算符和/或支持父属性，请使用以下代码行中的内容：

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "args", "parent", "branch", "seq", "isRoot"])

    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, pparent, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator, [], pparent, bbranch,
              KS.next(), bool(entry))
            this_node.args.extend(
              self.build_nodes(entry = 0, depth = depth - (i + 1),
                               pparent = this_node, bbranch = i)
              for i in range(Operators[this_operator]))

            self.thedict[this_node.seq] = this_node    
            return this_node

此代码还随着操作员的位置而减小深度。

【讨论】：

对我来说是多么棒的一课！我还在学习这一切，所以我花了一段时间来回应。在为适应度选择个体（树）时，会选择一个子集并进行杂交。随机选择两个并交换随机选择的子分支。这涉及更改父属性和分支属性。我仍在阅读有关命名元组以及您包含的有关类属性的链接。
我会执行你的第二个建议。我不确定当它使用'args []'而不是左右时如何走树。有一个类“染色体”，其中树是一个属性。我也会将它实现为一个命名元组。谢谢
我假设当我进入杂交程序时，我将能够使用 'somenamedtuple._replace(kwargs)' 来切换子树。
关于走树。在构建终端节点时，忽略 else 子句，我用 [None, None] 替换了分配给 'args' 的 'None' 值。

【解决方案2】：

您可以在代码中省略很多大括号，这是 Python 的优势之一。例如。在条件周围放置大括号时，例如

if (depth <= 0) or ((r > Ratio) and (not (entry))):

随便写

if depth <= 0 or (r > Ratio and not entry):

我认为有几个多余的调用，例如

this_atom = str(this_atom)

（this_atom 已经是一个字符串了，构建字符串总是很昂贵，所以省略这行）

或调用object 构造函数

object.__init__(self)

这也没有必要。

至于Node.__init__ 方法是“瓶颈”：我想你大部分时间都花在了那里，因为在构建这样的树时，除了创建新节点之外，你不会做太多其他事情。

【讨论】：

【解决方案3】：

您可以用 itertools.count 替换 KeySeq 生成器，它的作用完全相同，但用 C 语言实现。

我看不到任何加速节点构造函数的方法。您可以通过内联代码来优化对 random.choice 的调用 - 从随机模块的源代码中剪切并粘贴它。这将消除 Python 中相对昂贵的函数调用。

您可以通过在psyco 下运行来加速它，这是一种 JIT 优化器。但是，这只适用于 32 位 Intel 版本的 Python。或者，您可以使用cython - 这会将 python(ish) 代码转换为 C，可以将其编译为 Python C 模块。我说pythonish是因为有些东西是无法转换的，你可以添加C数据类型注解，让生成的代码更高效。

【讨论】：

使用 KS = itertools.count() 提高了 6%。将 random.choice() 内联提高了 4%。我会阅读cython。谢谢！