我如何在考虑性能的情况下重构此代码？答案

【问题标题】：How could I refactor this code with performance in mind?我如何在考虑性能的情况下重构此代码？
【发布时间】：2010-12-31 03:15:20
【问题描述】：

我有一个性能非常重要的方法（我知道过早的优化是万恶之源。我知道我应该并且我确实分析了我的代码。在这个应用程序中，我节省的每十分之一秒都是一个巨大的胜利。 ) 此方法使用不同的启发式方法来生成和返回元素。启发式顺序使用：使用第一个启发式直到它不能再返回元素，然后使用第二个启发式直到它不能再返回元素，依此类推，直到使用完所有启发式。在每次调用该方法时，我都会使用一个开关来移动到正确的启发式。这很丑陋，但效果很好。这是一些伪代码

class MyClass
{
private:
   unsigned int m_step;
public:
   MyClass() : m_step(0) {};

   Elem GetElem()
   {
      // This switch statement will be optimized as a jump table by the compiler.
      // Note that there is no break statments between the cases.
      switch (m_step)
      {
      case 0:
         if (UseHeuristic1())
         {
            m_step = 1; // Heuristic one is special it will never provide more than one element.
            return theElem;
         }

         m_step = 1;

      case 1:
         DoSomeOneTimeInitialisationForHeuristic2();
         m_step = 2;

      case 2:
         if (UseHeuristic2())
         {
            return theElem;
         }

         m_step = 3;

      case 3:
         if (UseHeuristic3())
         {
            return theElem;
         }
         m_step = 4; // But the method should not be called again
      }

      return someErrorCode;
   };
}

正如我所说，这很有效，因为在每次调用时，执行都会跳到应有的位置。如果启发式不能提供元素，则 m_step 会递增（因此下次我们不再尝试此启发式）并且因为没有 break 语句，所以会尝试下一个启发式。另请注意，某些步骤（如步骤 1）永远不会返回元素，而是为下一个启发式方法进行一次初始化。

初始化并非全部预先完成的原因是它们可能永远不需要。 GetElem 在它返回一个元素后不再被调用是有可能的（并且很常见），即使它仍然有可以返回的元素。

虽然这是一个有效的实现，但我觉得它真的很难看。案例陈述是一个黑客；不间断地使用它也很骇人听闻；即使每个启发式都封装在自己的方法中，该方法也会变得很长。

我应该如何重构这段代码，使其更具可读性和优雅，同时尽可能保持高效？

【问题讨论】：

重构通常不会让代码更高效——只是更易读和更易于维护。两者不应该同时进行。优化通常会降低代码的可读性和可维护性。
感觉就像你手动实现了“收益回报”。很酷！
@Larry Watanabe：你说得对，优化和可读性通常是相互竞争的目标。但是这段代码对我来说似乎很老套，我想知道是否有任何改进方法。
Mathieu：我知道你说过你分析了它，但很难相信很多 CPU 时间真的花在了这个函数上......除非调用者和启发式方法是微不足道的。
如果每秒调用大约 2000 万次，那么它的性能显然还不错。每次迭代只有 150 个周期。对于进行多个函数调用并具有多个分支的东西来说，这似乎还不错。我认为以这种方式使用 switch 语句已经足够普遍了，这并非不合理。

标签： c++ algorithm oop refactoring

【解决方案1】：

我不认为你的代码那么糟糕，但是如果你经常做这种事情，并且你想隐藏机制以便逻辑更清晰，你可以看看Simon Tatham's coroutine macros。它们适用于 C（使用静态变量）而不是 C++（使用成员变量），但更改它很简单。

结果应该是这样的：

Elem GetElem()
{
  crBegin;

  if (UseHeuristic1())
  {
     crReturn(theElem);
  }

  DoSomeOneTimeInitialisationForHeuristic2();

  while (UseHeuristic2())
  {
     crReturn(theElem);
  }

  while (UseHeuristic3())
  {
     crReturn(theElem);
  }

  crFinish;
  return someErrorCode;
}

【讨论】：

【解决方案2】：

如果您正在处理的元素代码可以转换为整数值，那么您可以根据该元素构造一个函数指针和索引表。该表将为每个“已处理”元素设置一个条目，并为每个已知但未处理的元素设置一个条目。对于未知元素，在索引函数指针表之前做一个快速检查。

调用元素处理函数很快。

这是工作示例代码：

#include <cstdlib>
#include <iostream>
using namespace std;

typedef void (*ElementHandlerFn)(void);

void ProcessElement0()
{
    cout << "Element 0" << endl;
}

void ProcessElement1()
{
    cout << "Element 1" << endl;
}
void ProcessElement2()
{
    cout << "Element 2" << endl;
}

void ProcessElement3()
{
    cout << "Element 3" << endl;
}

void ProcessElement7()
{
    cout << "Element 7" << endl;
}

void ProcessUnhandledElement()
{
    cout << "> Unhandled Element <" << endl;
}




int main()
{
    // construct a table of function pointers, one for each possible element (even unhandled elements)
    // note: i am assuming that there are 10 possible elements -- 0, 1, 2 ... 9 --
    // and that 5 of them (0, 1, 2, 3, 7) are 'handled'.

    static const size_t MaxElement = 9;
    ElementHandlerFn handlers[] = 
    {
        ProcessElement0,
        ProcessElement1,
        ProcessElement2,
        ProcessElement3,
        ProcessUnhandledElement,
        ProcessUnhandledElement,
        ProcessUnhandledElement,
        ProcessElement7,
        ProcessUnhandledElement,
        ProcessUnhandledElement
    };

    // mock up some elements to simulate input, including 'invalid' elements like 12
    int testElements [] = {0, 1, 2, 3, 7, 4, 9, 12, 3, 3, 2, 7, 8 };
    size_t numTestElements = sizeof(testElements)/sizeof(testElements[0]);

    // process each test element
    for( size_t ix = 0; ix < numTestElements; ++ix )
    {
        // for some robustness...
        if( testElements[ix] > MaxElement )
            cout << "Invalid Input!" << endl;
        // otherwise process normally
        else
            handlers[testElements[ix]]();

    }

    return 0;
}

【讨论】：

+1 恕我直言，这个（以及 Mark Ransoms 的回答）是这里唯一真正的重构示例（过程 -> OOP）。

【解决方案3】：

由于每个启发式都由具有相同签名的函数表示，因此您可以制作一个函数指针表并遍历它。

class MyClass 
{ 
private: 
   typedef bool heuristic_function();
   typedef heuristic_function * heuristic_function_ptr;
   static heuristic_function_ptr heuristic_table[4];
   unsigned int m_step; 
public: 
   MyClass() : m_step(0) {}; 

   Elem GetElem() 
   { 
      while (m_step < sizeof(heuristic_table)/sizeof(heuristic_table[0]))
      {
         if (heuristic_table[m_step]())
         {
            return theElem;
         }
         ++m_step;
      }

      return someErrorCode; 
   }; 
}; 

MyClass::heuristic_function_ptr MyClass::heuristic_table[4] = { UseHeuristic1, DoSomeOneTimeInitialisationForHeuristic2, UseHeuristic2, UseHeuristic3 };

【讨论】：

【解决方案4】：

您可以将控制流从里到外翻转。

template <class Callback>  // a callback that returns true when it's done
void Walk(Callback fn)
{
    if (UseHeuristic1()) {
        if (fn(theElem))
            return;
    }
    DoSomeOneTimeInitialisationForHeuristic2();
    while (UseHeuristic2()) {
        if (fn(theElem))
            return;
    }
    while (UseHeuristic3()) {
        if (fn(theElem))
            return;
    }
}

如果 switch 调度和 return 语句使 CPU 无法正常运行，并且接收方是可内联的，这可能会为您赢得几纳秒。

当然，如果启发式方法本身不平凡，这种优化是徒劳的。而且很大程度上取决于调用者的样子。

【讨论】：

请注意，一旦你决定走这条路，你可以对每个启发式做同样的事情，然后这变得更加简单：bool Walk(Callback fn) { return WalkHeuristic1(fn) || WalkHeuristic2(fn) || WalkHeuristic3(fn); }。

【解决方案5】：

您在这里真正可以做的是将条件替换为状态模式。

http://en.wikipedia.org/wiki/State_pattern

可能因为调用虚方法而性能较差，也可能因为状态维护代码较少而性能更好，但代码肯定会更加清晰和可维护，就像使用模式一样。

可以提高性能的是消除 DoSomeOneTimeInitialisationForHeuristic2(); 之间有单独的状态。 1 和 2。

【讨论】：

【解决方案6】：

如果没坏就不要修。

它看起来非常有效。看起来也不难理解。添加迭代器等可能会使它更难理解。

你可能会做得更好

性能分析。时间真的花在了这个过程上，还是大部分时间花在了它调用的函数上？我看不出有什么重要的时间花在这里。
更多单元测试，以防止有人在必须修改它时破坏它。
代码中的其他 cmets。

【讨论】：

我不知道你为什么被否决。虽然我不会接受这个答案，但它仍然提供了丰富的信息。感谢您花时间回答。

【解决方案7】：

这是微优化，但是当你不是从GetElem返回时，不需要设置m_elem值。请参阅下面的代码。

更大的优化肯定需要简化控制流（更少的跳转，更少的返回，更少的测试，更少的函数调用），因为一旦跳转完成，处理器缓存就会被清空（有些处理器有分支预测，但它不是灵丹妙药）。您可以尝试 Aaron 或 Jason 提出的解决方案，还有其他解决方案（例如，您可以实现几个 get_elem 函数并通过函数指针调用它们，但我很确定它会慢一些）。

如果问题允许，在启发式中一次计算多个元素并使用一些缓存，或者使其与一些线程计算元素真正并行，而这只是一个等待结果的客户......没有关于上下文的一些细节，没有办法说更多。

class MyClass
{
private:
   unsigned int m_step;
public:
   MyClass() : m_step(0) {};

   Elem GetElem()
   {
      // This switch statement will be optimized as a jump table by the compiler.
      // Note that there is no break statments between the cases.
      switch (m_step)
      {
      case 0:
         if (UseHeuristic1())
         {
            m_step = 1; // Heuristic one is special it will never provide more than one element.
            return theElem;
         }

      case 1:
         DoSomeOneTimeInitialisationForHeuristic2();
         m_step = 2;

      case 2:
         if (UseHeuristic2())
         {
            return theElem;
         }

      case 3:
         m_step = 4;

      case 4:
         if (UseHeuristic3())
         {
            return theElem;
         }
         m_step = 5; // But the method should not be called again
      }

      return someErrorCode;
   };
}

【讨论】：

【解决方案8】：

将每个启发式封装在一个迭代器中。在第一次调用hasNext() 时将其完全初始化。然后将所有迭代器收集到一个列表中，并使用超级迭代器遍历所有迭代器：

boolean hasNext () {
    if (list.isEmpty()) return false;

    if (list.get(0).hasNext()) return true;

    while (!list.isEmpty()) {
        list.remove (0);
        if (list.get(0).hasNext()) return true;
    }
    return false;
}
Object next () {
    return list.get (0).next ();
}

注意：在这种情况下，链表可能比 ArrayList 快一点点，但您仍应检查这一点。

[编辑] 将“turn each”更改为“wrap each”以使我的意图更加清晰。

【讨论】：

另外......我将创建几个实现并在实践中测试它们的性能。提问者使用什么语言？
我不明白你在说每个启发式必须是不同的对象实现相同的接口（或继承相同的抽象类）并放置它们进入迭代器以选择好的迭代器。这是一个很好的解决方案（多态性）
如果你走这条路，并决定制作真正的 C++ 迭代器，请参阅stackoverflow.com/questions/981186/chain-iterator-for-c 获取一些建议。

【解决方案9】：

在我看来，如果您不需要对这段代码进行太多修改，例如添加新的启发式方法，那么请做好记录，不要碰它。

但是，如果添加和删除了新的启发式方法，并且您认为这是一个容易出错的过程，那么您应该考虑对其进行重构。显而易见的选择是引入状态设计模式。这将用多态性替换您的 switch 语句，这可能会减慢速度，但您必须对两者进行分析才能确定。

【讨论】：

【解决方案10】：

看起来在这段代码中确实没有太多需要优化的地方——可能大部分优化都可以在 UseHeuristic 函数中完成。里面有什么？

【讨论】：

我认为目标不是优化这段代码，而是让它更具可读性而不会对性能产生负面影响。