【问题标题】:Multidimensional iterator in TensorFlowTensorFlow 中的多维迭代器
【发布时间】:2020-10-23 05:36:44
【问题描述】:

我想总结在超网格上评估的矢量化函数的返回值:

    l = tf.reshape(tf.linspace(-b, b, n), (n, 1))
    dims = [l] * d
    mesh = tf.meshgrid(*dims)
    y = f(mesh) 
    result = tf.reduce_sum(y)

不幸的是,当调用tf.meshgrid 进行高维输入时,mesh 变得如此之大以至于无法放入 VRAM。

因此,我正在寻找类似于 np.ndindex 的解决方案,它可以让我在 TensorFlow 中生成子网格。我不想使用循环,因为d 会有所不同。另外,我不确定在 Tensorflow 1.15 中处理递归是否可行。提前致谢!

【问题讨论】:

    标签: python tensorflow gpu gpgpu


    【解决方案1】:

    这是我制定的一个可能的解决方案,不确定它是否真的适合您的情况。这个想法是根据给定的大小递归地细分所需的网格。有趣的是,这在 TensorFlow 2.x 中不起作用,因为它试图将递归函数变成一个图,这将导致一个无限图 - 不确定是否有解决方法。该解决方案显然比直接在整个网格上进行计算要慢得多,但结果技术上是相同的。问题是错误。如果网格真的很大,那么对其整体进行缩减很可能会产生大量错误。不过,这首先发生在没有进行细分的情况下,实际上细分似乎可以减少错误,如果有的话(至少在我做过的一些实验中)。

    不管怎样,代码有点长,虽然概念上不算太复杂,但希望cmets能说清楚。

    import tensorflow as tf
    
    def block_reduction_func(block):
        # This function computes the reduction of a block
        return tf.math.reduce_sum(block)
    
    def intermediate_reduction_func(values):
        # This function computes the reduction of an
        # array of intermediate reduction results
        # (in this example is the same)
        return block_reduction_func(values)
    
    def make_block(aa):
        # Makes an actual block from some space slices
        return tf.stack(tf.meshgrid(*aa), axis=-1)
    
    def get_block_slices(aa, i, size):
        # Selects the space slices corresponding to a particular block
        aa2 = []
        for dim, a in enumerate(aa):
            # Number of slices in this level for this dimension
            s = tf.size(a)
            n = s // size
            n += tf.dtypes.cast(s % size > 0, n.dtype)
            # Select dimension slice
            j = i % n
            aa2.append(aa[dim][j * size:(j + 1) * size])
            i //= n
        return aa2
    
    def by_blocks(aa, blocks):
        # Reduces a space by blocks
        if not blocks:
            # When there are no more subdivisions to do
            # just reduce the current block
            res = block_reduction_func(make_block(aa))
            with tf.control_dependencies([]): #([tf.print(res, aa)]):
                return res + 0
        else:
            # Get current division size
            size, *blocks = blocks
            # Get number of blocks in this recursion level
            num_blocks = 1
            for a in aa:
                s = tf.size(a)
                n = s // size
                n += tf.dtypes.cast(s % size > 0, n.dtype)
                num_blocks *= n
            # Array for intermediate results
            ta = tf.TensorArray(aa[0].dtype, num_blocks, element_shape=())
            # Loop through blocks
            _, ta = tf.while_loop(
                lambda i, ta: i < num_blocks,
                lambda i, ta: (i + 1,
                               ta.write(i, by_blocks(get_block_slices(aa, i, size), blocks))),
                [0, ta], parallel_iterations=1)
            # Reduce intermediate results
            values = ta.stack()
            return intermediate_reduction_func(values)
    
    # Test
    b = 1.0
    n = 100
    d = 3
    # Recursive divisions of n (can have arbitrary size)
    # Divide in blocks of 60, then blocks of 12
    blocks = [60, 12]
    with tf.Graph().as_default(), tf.Session():
        # Using positive values only in this example
        # so the errors do not overtake the result
        a = tf.linspace(0., b, n)
        aa = [a] * d
        r1 = block_reduction_func(make_block(aa))
        r2 = by_blocks(aa, blocks)
        # Check results (should be 1500000)
        print(r1.eval())
        # 1499943.6
        print(r2.eval())
        # 1499998.8
    
        # CPU timings
        %timeit r1.eval()
        # 99.3 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
        %timeit r2.eval()
        # 96.8 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
        # GPU timings
        %timeit r1.eval()
        # 195 µs ± 615 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
        %timeit r2.eval()
        # 316 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    【讨论】:

    • 非常感谢,这看起来很有希望,我会在周末试一试!
    猜你喜欢
    • 2018-09-04
    • 2014-04-20
    • 1970-01-01
    • 2015-10-05
    • 1970-01-01
    • 2014-12-21
    • 2016-06-01
    • 2023-03-09
    • 2019-04-03
    相关资源
    最近更新 更多