成对函数评估算法 (C++, STL)答案

【问题标题】：Algorithm for function evaluation by pairs (C++, STL)成对函数评估算法 (C++, STL)
【发布时间】：2016-05-20 00:14:06
【问题描述】：

我需要将自定义func 成对应用于 STL 容器 -> 即：

// if c => {a,b,c,d,e,f,g}; // a,b,c,.. are just aliases for some object
my_algorithm(c.begin(),c.end(),[](auto a, auto b){ a + b }); // c++14

应该变成这样的：

temp1 = a + b;
temp2 = c + d;
temp3 = e + f;
temp4 = temp1 + temp2;
temp5 = temp3 + g;
result = temp4 + temp5;

（我确信这种算法有一个正确的名称，但我不知道这可能是什么）

我尝试过使用std::accumulate，我不确定它的实现是否由标准定义，但在我的情况下和我的编译器似乎可以解决这个问题（我认为这被称为成对求和，对吧？）：

temp1 = a + b;
temp2 = temp1 + c;
temp3 = temp2 + d;
// etc

这和我能得到的不太一样

auto temp = c[0];
std::for_each(c.begin()+1,c.end(),[&temp](auto a){temp + a); // c++14

我浏览了 STL 和 Boost，但没有找到相关的内容。有没有提供这种算法的库？如果没有，有什么好的 STL 兼容实现的想法吗？

编辑只是补充一点，我对添加传统意义上的元素并不感兴趣——在这种情况下，顺序并不重要。我的函数将进行更复杂、加权的求和，如果以这种方式执行，将给出不同的结果。不过，我的问题更笼统。

【问题讨论】：

partial_sum() ？有点不清楚应该是什么结果
std::accumulate 绝对是由标准定义的left fold。似乎您想自下而上构建一个平衡树，这是可能的，但在标准库 afaik 中是不可能的。（“构建平衡树”有不同的算法，具体取决于您对“平衡”的定义，您仅通过一个示例提供；我认为这对于实现来说不够精确。）
其实我从CUDA那里知道这种积累，那里叫“并行归约”
我认为以这种方式添加它们没有任何好处，除非您真的会使用多个线程？或者，如果您担心一些浮点问题，可能......我很难说出这种算法的普遍有用性，我认为我不希望在标准库中找到它跨度>
我对传统意义上的添加它们并不感兴趣。我在案例中传递的函数会进行更复杂、加权的求和，如果以这种方式执行，将会给出不同的结果。

标签： c++ algorithm c++11 stl c++14

【解决方案1】：

这是我在 C++11 标准下对 STL 兼容解决方案的尝试：

#include <cassert>
#include <cmath>
#include <cstddef>

#include <array>
#include <iostream>
#include <iterator>

namespace detail {

  // Returns first power of two which is strictly less than n
  unsigned int pot_half(std::ptrdiff_t n) {
    assert(n > 1);
    return 1 << (static_cast<unsigned int>(ceil(log2(n))) - 1);
  }

} // end namespace detail

struct tree_fold_on_empty_range : std::exception {};

template <typename Iterator, typename F>
auto tree_fold(const Iterator & begin, const Iterator & end, F && func) -> decltype(func(*begin, *end)) {
  std::ptrdiff_t diff = end - begin;
  switch (diff) {
    case 0: throw tree_fold_on_empty_range{}; // or, return {}; ?
    case 1: return *begin;
    case 2: return func(*begin, *(begin + 1));
    default: {
      Iterator mid{begin};
      std::advance(mid, detail::pot_half(diff));
      return func(tree_fold(begin, mid, func), tree_fold(mid, end, func));
    }
  }
}

int main() {
  for (uint n = 2; n < 20; ++n) {
    std::cout << n << " -> " << detail::pot_half(n) << std::endl;
  }
  std::cout << std::endl;

  std::array<int, 8> test{1, 2, 3, 4, 5, 6, 7, 8};
  std::cout << tree_fold(test.begin(), test.end(), [](int a, int b){ return a + b; }) << std::endl;
  std::cout << tree_fold(test.begin(), test.end(), [](int a, int b){ return a - b; }) << std::endl;
}

Live on coliru also,

它给出了这个作为最终输出：

36
0

我相信这表明它是正确的：

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36
((1 - 2) - (3 - 4)) - ((5 - 6) - (7 - 8)) =
((-1) - (-1)) - ((-1) - (-1)) =
0 - 0 = 0

请注意，在不是 2 的幂的范围上的“正确”行为有点模棱两可。在我的版本中，我所做的总是以小于n 的二的第一个幂分割长度范围n。所以如果你给它2的幂，你总是得到一个完美平衡的二叉树。如果你给它 6，你会得到这样的东西：

        /\
    /\       /\
  /\  /\

但是没有什么说总是除以二也不正确，所以你会得到这样的树结构

        /\
    /\       /\
  /\       /\

所以海事组织你的问题有点不明确。也许对你来说没关系，只要深度是O(log n)？

【讨论】：

感谢您的示例，这真的很有趣。我不一定一直在寻找涉及树的算法，所以我没有想到这些细节。我认为这在很大程度上取决于情况：由于我们谈论的是操作顺序确实很重要的情况，因此非二的幂的行为是否可以接受取决于操作/情况。它可能适用于我的情况，但我必须运行一些测试..

【解决方案2】：

自 2015 年 11 月以来，我一直在一个所谓的 VectorFuncRange 容器中工作，该容器在 C++14 中以 STL 样式解决该问题。

我做了我自己的 beta 版本，它可以很好地模仿 std::vector 容器，但使用 func_range() 方法返回 O(log n) 范围内的函数评估，评估为树。我赞同即使在内部评估为树，它们也只是向量并且具有 O(1) 随机访问、摊销 O(1) 中的 push_back 和最坏情况 O(log n) 等。一些 std::vector 方法尚未编程对我来说，作为 emplace_back() 和不同的构造，但用作向量的主要构造是。出于测试原因，我将 rang_func() 与 range_func_dumb() 进行了比较，第二个版本以线性顺序评估函数。

VectorFuncRange.h 我当前的版本：http://pastebin.com/dnwznUqg 以 5 种不同方式执行此操作的测试代码，包括整数、矩阵和其他类型以及许多函数：http://pastebin.com/YdRfN0CQ

我曾考虑过放入一个公共 Git，但我想我应该在此之前组织更多我的代码，我不知道其他人是否有兴趣贡献。

【讨论】：

【解决方案3】：

你应该看看 std::transform 的第二种形式：http://www.cplusplus.com/reference/algorithm/transform/

在 C++ 11 附近的伪代码中，算法的 STL 实现可能如下所示：

c = {a,b,c,d,e,f,g} // container of elements of type 'my_obj'
tmp = {a,b,c,d,e,f,g} // copy of 'c' to not impact 'c' while executing algorithm
while (tmp.size() > 1)
{
    // partition 'tmp' into even index elements 'c1' and odd index elements 'c2'
    // first iteration would look like this :
    // c1 = {a,c,e,g}
    // c2 = {b,d,f,identity} where 'idendity' is a new element (when 'tmp' size is odd) to match 'g' without impacting final result... identity = 0 for integers addition :)

    // overwrite first elements of 'tmp' with intermediate results
    std::transform(c1.cbegin(), c1.cend(), c2.cbegin(), tmp.begin(), std::plus<my_obj>()); // replace std::plus with any other binary operation including any proper lambda

    // cut 'tmp' ununsed upper half
    tmp.resize(size_t(0.5 * (tmp.size() + 1)));
}
my_obj result = tmp[0];

在开始时复制“c”并在每次迭代中将“tmp”分成两半显然是有代价的。你决定如何从这里优化:)

【讨论】：

谢谢，这是一个不错的尝试，但就我而言，a,b,c,d,e,f 的复制成本相当高。也许我可以使用一些指针进行优化，我可能会尝试并报告。

【解决方案4】：

考虑到一些建议的解决方案（尤其是 Chris Beck 的），我想出了this algorithm，我现在正试图进一步优化它。我已将其移至不同的线程，因为我认为该代码引发了许多值得讨论的问题。

【讨论】：