C++ 代码挑战中的性能问题答案

【问题标题】：performance issue in C++ codility challengeC++ 代码挑战中的性能问题
【发布时间】：2014-11-27 10:37:13
【问题描述】：

编辑：问题实际上是算法问题（感谢 molbdnilo 下面的回答）失败的情况是 O(N2) --> 二次的。下面的人实际上是在试图找到一个真正的最坏情况 O( N Log(N) ) 时间复杂度算法。

我接受了这个月的代码挑战。我花了大约一个小时才得到 100% 正确的 O(N Log(N)) 时间复杂度算法。

但正如您在下面看到的那样，我的性能提高了 75%，因为其中一项测试需要 10 倍的时间才能运行。我不明白为什么！ 你能指出我的错误吗？

第 2 点包含我的解决方案的完整问题描述和完整报告（测试用例和时间安排）。

粗略地说，我一个接一个地添加每根绳索，并从添加的节点位置更新到根（祖先）的路径，并使用可以添加到每个祖先“下方/下方”的新最大权重。

代码如下：

    // you can use includes, for example:
    #include <algorithm>
    #include <vector>
    #include <map>
    #include <iostream>

    using namespace std;
    // you can write to stdout for debugging purposes, e.g.
    // cout << "this is a debug message" << endl;


    struct node
    {
        int max_to_add;
        int id;
        node* mummy;
    };

    std::map< int, node* > nodes;

    bool insertRope( int durability, int pos, int Id, int weight )
    {
        node* n = new node;
        n->id = Id;
        nodes[Id] = n;

        if( pos == -1 )
        {
            n->max_to_add = durability - weight;
            n->mummy = NULL;
            if( n->max_to_add < 0 ) return false;
        }
        else
        {
            std::map< int, node* >::iterator it = nodes.find(pos);
            if( it != nodes.end() )
            {
                node* parent = (*it).second;

                n->mummy = parent;
                n->max_to_add = std::min( ( parent->max_to_add - weight),  (durability - weight) ) ;
                if( n->max_to_add < 0 ) return false;

                node* current = n;
                while ( (current = current->mummy) != NULL )
                {
                    current->max_to_add = current->max_to_add - weight;
                    if( current->max_to_add < 0 ) 
                    {
                        return false;
                    }
                }
            }

        }
        return true;
    }

    int solution(vector<int> &A, vector<int> &B, vector<int> &C) {
        // write your code in C++11

        for(int i = 0; i < A.size() ; ++i) 
        {
            if( insertRope( A[i], C[i],i,B[i] ) == false ) {return i;} 
        }

        return A.size();
    }

    int main()
    {

        /*static const int arrA[] = {4, 3, 1};
        vector<int> vecA (arrA, arrA + sizeof(arrA) / sizeof(arrA[0]) );

        static const int arrB[] = {2, 2, 1};
        vector<int> vecB (arrB, arrB + sizeof(arrB) / sizeof(arrB[0]) );


        static const int arrC[] = {-1, 0, 1};
        vector<int> vecC (arrC, arrC + sizeof(arrC) / sizeof(arrC[0]) );
        */


        static const int arrA[] = {5, 3, 6, 3, 3};
        vector<int> vecA (arrA, arrA + sizeof(arrA) / sizeof(arrA[0]) );

        static const int arrB[] = {2, 3, 1, 1, 2};
        vector<int> vecB (arrB, arrB + sizeof(arrB) / sizeof(arrB[0]) );


        static const int arrC[] = {-1, 0, -1, 0, 3};
        vector<int> vecC (arrC, arrC + sizeof(arrC) / sizeof(arrC[0]) );

        int sol = solution(vecA,vecB,vecC);

        system("PAUSE");
        return 0;
    }

编辑 1： 根据 Rafid 的建议，我使用了 new[]，效果更好，但我仍然遇到性能问题： https://codility.com/cert/view/certRT5YDP-W65HGPF28B5RN5AY/details

【问题讨论】：

你在什么平台上？我建议使用 callgrind 来查看您的瓶颈在哪里。在相当大的输入数据集（100k）上，这也会失败（性能方面），因此请尝试自己使用具有代表性的输入数据集。
我在 Windows 7 64 位上使用 VS2010。我不知道他们在什么平台上测试。 callgrind 在 Windows 上不可用？也许很困......
callgrind 在 Windows afaik 上不可用。 VerySleepy 将为您提供可能有用的信息。
我会在家里测试它，除非我可以绕过缺少的管理员权限并在此处安装它。 :)
我认为您在构建长线（没有分支）时会超时，因为为了调整max_to_add，您每次插入时都会遍历所有节点（二次时间，IIRC） .

标签： c++ performance algorithm

【解决方案1】：

我可以指出的一个性能提示是：避免重复使用“新”运算符，因为它很昂贵。您可以先创建一大块内存，然后在需要时使用它，这样您就不会在堆中重复分配内存。

【讨论】：

是的，这很好。地图上有一些“保留”功能吗？像 map.reserve(vec.size()) ?
嗯，据我所知没有这样的方法，但这不是我所指的。我指的是 insertRope() 方法中的第一行。
哦！我明白了：在解决方案中，我应该将 node* 预分配给 sizeof(Node) 乘以 A.size() 并重用这些节点？你是这个意思吗？
您可以在此处使用 New[] 看到新的工作解决方案：codility.com/cert/view/certRT5YDP-W65HGPF28B5RN5AY/details perf 同样的问题！
std::map<int, node> nodes; 应该足够了。

【解决方案2】：

免责声明：我不能 100% 确定这是导致您的问题的原因。

请注意，您表现非常糟糕的情况如下： “行”配置中的 100K 项。

如果您查看您的 while 循环，您会发现您的算法没有提供 O(NlogN) 最坏情况复杂度。最坏的情况是所有的绳索都是对齐的，每次添加节点时都必须遍历整个树来更改max_to_add

这并没有改变您可能（如某些 cmets 中所建议的那样）仅使用 std::map 而没有指针的事实，这可能会提供更好的性能，因为您每次都不需要newallocation创建一个节点，您在堆栈上创建它们。甚至可能使用 std::unordered_map。

编辑：

好的，我找到了一种提高复杂性的方法。您不需要更新所有绳索的负载，您只需要在树上的叶子节点上保持正确的值。该值应该是该分支上节点的max_to_add 的最小值。

我会尽可能发布一些代码。

【讨论】：

是的，正如上面评论中与@molbdnilo 所讨论的那样，该算法在最坏的情况下确实是二次的。尽管这将是一种改进，但无序的地图会使它成为二次方。此外，我在这里无法使用 C++11 编译器，我无法在代码挑战中使用 boost。
关于您的编辑：“您不需要更新所有绳索的负载”。是的你是！因为额外的绳索不一定会添加到叶子上！示例：假设当前叶子的 max_to_add = 1，然后在该叶子的每一侧添加 weight=1...
尝试提交代码：codility.com/programmers/challenges接受硫磺挑战！

【解决方案3】：

如果我们只是跳过将元素添加到地图并尝试计算节点的当前容量怎么办？现在很难检查它，但这里的代码显示了这个想法。

bool process_node(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C, int index) {

    int next_parent = C[index];

    A[index] -= B[index];

    while(next_parent != -1) {

        A[next_parent] -= B[index];
        if (A[next_parent] < 0) {
            return false;
        }

        next_parent = C[next_parent];
    }

    return true;
}

int solution(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C) {

    for (int i = 0; i < A.size(); ++i) {
        if (!process_node(A, B, C, i)) {
            return i;
        }
    }

    return A.size();
}

它看起来像O(NlogN) time，因为我们在树中进行了 N 次父查找。也许我们避免了一些额外的遍历。

【讨论】：

已测试：codility.com/cert/view/certGPRFST-ACA7DE4S8KBKPSAS/details 并非适用于所有情况。
添加了检查bool process_node(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C, int index) { int next_parent = C[index]; A[index] -= B[index]; if (A[index] < 0) { return false; } while(next_parent != -1) { A[next_parent] -= B[index]; if (A[next_parent] < 0) { return false; } next_parent = C[next_parent]; } return true; } int solution(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C) { for (int i = 0; i < A.size(); ++i) { if (!process_node(A, B, C, i)) { return i; } } return A.size(); } 但它可能会按时失败。可以查一下吗？
您可以在此处提交代码：codility.com/programmers/challenges 选择硫磺挑战赛！如果你愿意，我可以做到。
我在 codility.com 上提交了这个，它在性能方面与您的第一个实现有同样的问题。（有 100k 个项目排队超时的情况）。另一方面，它要小得多，可能更容易看到可能的最坏情况。
我已经尝试了完全不同的方法，但没有任何成功......我们会在发布时看到解决方案！