内存/速度问题的一般策略答案

【问题标题】：General strategies for memory/speed problems内存/速度问题的一般策略
【发布时间】：2012-02-10 05:58:03
【问题描述】：

我有一个 c++ 代码，它运行大约 200 个 ASCII 文件，进行一些基本的数据处理，并输出一个包含（基本上）所有数据的单个 ASCII 文件。

程序一开始运行得很快，然后在中途急剧减速，也许逐渐减速一点，然后在其余部分以相当慢的速度运行。 IE。它在大约 5 秒内通过前约 80 个文件，在约 50 秒内通过约 200 个总文件。每个文件基本相同。

我正在寻找有关如何追踪问题或内存泄漏的建议。

更多细节：起初我会在程序的开头使用 fopen(FILE *outputFile, "w") ，最后使用 fclose() 。前约 40 个文件需要约 4 秒；然后大约 1.5 分钟，大约 200 个文件。

我认为可能是输出文件阻塞了内存，所以我将代码更改为 fopen(outputFile, "a") 每次迭代（即每次我打开一个新文件时），每次我关闭输入时 fclose()文件...这将性能提高到总共约 50 秒，如上所述。

奇怪的是，这个“修复”会如此显着，但并非完全有帮助。

另外，我没有动态分配任何内存（没有调用“new”或“delete”或“free”等）......所以我什至不确定我怎么能拥有内存泄漏。

任何帮助将不胜感激！谢谢！

代码：

vector<string> dirCon;
// Uses boost::filesystem to store every file in directory
bool retVal = FileSystem::getDirectoryContents(HOME_DIR+HISTORY_DIR, &dirCon, 2);

int counter = 0;
for(int i = 0; i < dirCon.size(); i++) { 
    // Create output file
    FILE *outFile;
    string outputFileName = HOME_DIR ... ;
    // open file as append "a"
    bool ifRet = initFile(outFile, outputFileName.c_str(), "a");
    if(!ifRet) {
        fprintf(stderr, "ERROR ... ");
        return false;
    }       

    // Get the topmost directory name
    size_t loc = dirCon.at(i).find_last_of("/");
    string dirName = dirCon.at(i).substr(loc+1, (dirCon.at(i).size()-(loc+1)));

    // Get the top directory content
    vector<string> subDirCon;
    bool subRetVal = FileSystem::getDirectoryContents(dirCon.at(i), &subDirCon);
    if(!subRetVal) { fprintf(stderr, "ERROR\n"); return false; }

    // Go through each file in directory, look for the one that matches
    for(int j = 0; j < subDirCon.size(); j++) {

        // Get filename
        loc = subDirCon.at(j).find_last_of("/");
        string fileName = subDirCon.at(j).substr(loc+1, (subDirCon.at(j).size()-(loc+1)));

        // If filename matches desired station, process and store
        if( fileName == string(dirName ...) ) {
            // Open File
            FILE *inFile;
            if(!initFile(inFile, subDirCon.at(j).c_str(), "r")) { 
                fprintf(stderr, "ERROR: ... !\n");
                break;
            }

            // Parse file line-by-line
            char str[TB_CHARLIMIT_LARGE];
            const char *delim = ",";
            while(true) {
                vector<string> splitString;
                fgets(str, TB_CHARLIMIT_LARGE, inFile);

                if(feof(inFile)) { break; }     // break at end of file
                removeEndLine(str);

                // If non-comment line, parse
                if(str[0] != COMCHAR){
                    string strString(str);
                    // remove end line char
                    strString.erase(std::remove(strString.begin(), strString.end(), '\n'), strString.end());
                    strcpy(str, strString.c_str());

                    char *temp = strtok(str,delim);
                    char *lastTemp;
                    while(temp != NULL) {
                        splitString.push_back(string(temp));
                        temp = strtok(NULL,delim);
                    }
                    if(splitString.size() > 0) { 
                        DateTime dtTemp(splitString.at(0));  
                        goodLines++;

                        /*  ... process splitString, use dtTemp ... */

                        // Output to file
                        fprintf(outFile, "%s\n", strFromStrVec(splitString).c_str());
                    }
                }
            } //while
            fclose(inFile); 
        }
    } //j
    cout << "GoodLines = " << goodLines << endl;

    fclose(outFile);
} // i

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    path p(dirName);
    try {
        // Confirm Exists
        if(!exists(p)) {
            fprintf(stderr, "ERROR: '%s' does not exist!\n", dirName.c_str());
            return false;
        }

        // Confirm Directory
        if(!is_directory(p)) {
            return false;
        }

        conts->clear();

        // Store paths to sort later
        typedef vector<path> vec;
        vec v;

        copy(directory_iterator(p), directory_iterator(), back_inserter(v));

        sort(v.begin(), v.end()); 

        for(vec::const_iterator it(v.begin()), it_end(v.end()); it != it_end; ++it) {
            conts->push_back(it->string());
        }


    } catch(const filesystem_error& ex) {
        fprintf(stderr, "ERROR: '%s'!\n", ex.what());
        return false;
    }   

    return true;
}

【问题讨论】：

您可能只是看到了一些奇怪的缓冲行为......但我们需要查看一些代码和诊断才能提供帮助。
在一般情况下，您需要优化三个方面：处理器时间、内存使用和 I/O。根据您构建的程序如何处理数据，在任何情况下您可能有也可能没有三个选项中的任何一个来优化。你的程序应该完成什么？
试一试Random Pausing。
添加代码；希望这有助于解决问题。
值得一提的是，您使用的是动态内存。也许不是直接的，但开销仍然存在。 std::vector 和 std::string 都使用动态内存来存储它们的内容。当然，如果您明智地使用 STL，则不太可能发生真正的内存泄漏，但这并不意味着您没有动态内存的开销。

标签： c++ c optimization memory-management memory-leaks

【解决方案1】：

如果没有更多信息，我猜你正在处理的是 Schlemiel the Painter 的算法：(Original)(Wikipedia)。他们非常容易陷入进行字符串处理。举个例子吧。

我想读取文件中的每一行，以某种方式处理每一行，通过一些中间处理运行它。然后我想收集结果，也许把它写回磁盘。这是一种方法。我犯了一个很容易被忽略的大错误：

// proc.cpp
class Foo
{
  public:
  std::string chew_on(std::string const& line_to_chew_on) {...}
  ...
};

Foo processor;
std::string buffer;

// Read/process
FILE *input=fopen(..., "r");
char linebuffer[1000+1];
for (char *line=fgets(linebuffer, 1000, input); line; 
     line=fgets(linebuffer, 1000, input) ) 
{
    buffer=buffer+processor.chew_on(line);  //(1)
}
fclose(input);

// Write
FILE *output=fopen(...,"w");
fwrite(buffer.data(), 1, buffer.size(), output);
fclose(output);

这里乍一看容易忽略的问题是，每次运行(1) 行时，都会复制buffer 的全部内容。如果有 1000 行，每行 100 个字符，您最终会花费时间复制 100+200+300+400+....+100,000=5,050,000 字节的副本来运行它。增加到10,000行？ 500,500,000。那个油漆罐越来越远了。

在这个特定的示例中，修复很容易。 (1) 行应为：

    buffer.append(processor.chew_on(line)); // (2)

或等效地：（感谢 Matthieu M.）：

    buffer += processor.chew_on(line);

这会有所帮助，因为（通常）std::string 不需要制作 buffer 的完整副本来执行 append 功能，而在 (1) 中，我们坚持要制作一个副本.

更一般地说，假设 (a) 您正在执行的处理保持状态，(b) 您经常引用它的全部或大部分，并且 (c) 该状态随着时间的推移而增长。然后很有可能您已经编写了一个 Θ(n²) 时间算法，它将准确地表现出您正在谈论的行为类型。

编辑

当然，“为什么我的代码很慢？”的常见答案。是“运行配置文件”。有许多工具和技术可以做到这一点。一些选项包括：

callgrind/kcachegrind（由David Schwartz建议）

Random Pausing（由Mike Dunlavey建议）

GNU 分析器，gprof

GNU 测试覆盖率分析器，gcov

oprofile

他们都有自己的长处。 “随机暂停”可能是最容易实现的，尽管可能很难解释结果。 'gprof' 和 'gcov' 在多线程程序上基本上没用。 Callgrind 很彻底但很慢，有时会在多线程程序上玩奇怪的把戏。 oprofile 速度很快，可以很好地与多线程程序配合使用，但可能难以使用，并且可能会遗漏一些东西。

但是，如果您正在尝试分析单线程程序，并且正在使用 GNU 工具链进行开发，那么 gprof 可能是一个不错的选择。以我的 proc.cpp 为例。出于演示的目的，我将分析未优化的运行。首先，我重建我的程序进行分析（在编译和链接步骤中添加-pg）：

$ g++ -O0 -g -pg -o proc.o -c proc.cpp
$ g++ -pg -o proc proc.o

我运行程序一次以创建分析信息：

./proc

除了做它通常做的事情外，这次运行还会在当前目录中创建一个名为“gmon.out”的文件。现在，我运行 gprof 来解释结果：

$ gprof ./proc 扁平型材：每个样本计为 0.01 秒。 % 累计自我自我总计时间秒秒呼叫 ms/呼叫 ms/呼叫名称 100.50 0.01 0.01 234937 0.00 0.00 std::basic_string<...> std::operator+<...>(...) 0.00 0.01 0.00 234937 0.00 0.00 Foo::chew_on(std::string const&) 0.00 0.01 0.00 1 0.00 10.05 do_processing(std::string const&, std::string const&) ...

是的，我的程序 100.5% 的时间都花在了std::string operator+。好吧，好吧，最多有一些抽样误差。（我在虚拟机中运行它...似乎 gprof 捕获的时间已关闭。我的程序运行的累积时间超过了 0.01 秒...）

对于我非常简单的示例，gcov 的指导性较差。但这就是它所显示的。首先，为gcov编译运行：

$ g++ -O0 -fprofile-arcs -ftest-coverage -o proc proc.cpp
$ ./proc
$ gcov ./proc
...

这会在当前目录中创建一堆以.gcno、.gcda、.gcov 结尾的文件。 .gcov 中的文件告诉我们在运行期间每行代码执行了多少次。因此，在我的示例中，我的 proc.cpp.gcov 最终看起来像这样：

-：0：来源：proc.cpp -：0：图形：proc.gcno -：0：数据：proc.gcda -：0：运行：1 -：0：程序：1 -：1：#包括 -：2：#包括 -：4：类Foo -：5：{ -：6：公开： 234937：7：std::stringchew_on（std::string const& line_to_chew_on）{return line_to_chew_on;} -：8：}； -：9： -：10： -：11： 1: 12:int do_processing(std::string const& infile, std::string const& outfile) -：13：{ -：14：Foo处理器； 2: 15: std::string 缓冲区； -：16： -: 17: // 读取/处理 1: 18: 文件 *input=fopen(infile.c_str(), "r"); -: 19: 字符线缓冲区[1000+1]; 234938: 20: for (char *line=fgets(linebuffer, 1000, input); line; -: 21: line=fgets(linebuffer, 1000, 输入)) -：22：{ 234937：23：缓冲区=缓冲区+处理器.chew_on（行）； //(1) -：24：} 1: 25: fclose(输入); -：26： -：27：//写 1: 28: 文件 *output=fopen(outfile.c_str(),"w"); 1: 29: fwrite(buffer.data(), 1, buffer.size(), 输出); 1: 30: fclose(输出); 1:31:} -：32： 1: 33:int main() -：34：{ 1: 35: do_processing("/usr/share/dict/words","outfile"); -：36：}

因此，我将不得不得出结论，第 23 行的 std::string::operator+（执行了 234,937 次）是我的程序运行缓慢的潜在原因。

顺便说一句，callgrind/kcachegrind 与多线程程序一起工作，并且可以提供更多、更多的信息。对于我运行的这个程序：

g++ -O0 -o proc proc.cpp
valgrind --tool=callgrind ./proc  # this takes forever to run
kcachegrind callgrind.out.*

我发现以下输出，表明真正消耗我的周期的是大量的内存副本（99.4% 的执行时间花费在 __memcpy_ssse3_back 中），我可以看到所有这些都发生在我的第 23 行以下的某个地方资源：

【讨论】：

当然std::string::operator+=（+的复合版本）会更容易拼写。
感谢@Managu，这是我从未考虑过的事情——知道这件事很有帮助。我不认为在我的代码中（我已将其添加到上面的帖子中）有任何类似的实例
@Managu 我尝试使用 gprof 进行分析，并显示有关调用次数的预期信息；但是每件事的时间使用率为 0.0%……知道为什么它不会给我时间信息吗？
可能时间使用不在您分析的文件中。也许编译包含 FileSystem::getDirectoryContents 并启用分析的文件（即“-pg”），看看它有多大贡献？或者试试 callgrind。

【解决方案2】：

使用 valgrind 套件的一部分 callgrind 分析您的代码。您可以使用kcachegrind 以图形方式浏览结果。（尽管它的名字，它也适用于 callgrind 输出。）它是免费的，会给你很棒的细节。

您还可以在外部关闭和打开数据收集。所以从关闭它开始，等到你的程序变慢，在出现问题的时候打开它，然后关闭它。你会看到 CPU 的去向。如有必要，仅在快速和比较时在反向观看中执行相同的操作。

通常，问题会像拇指疼痛一样突出。

【讨论】：

感谢您的提示；看起来这个程序不能查看一般内存使用情况——这是真的吗？如果是这样，您是否推荐其他类似的程序？
memcheck（又名 valgrind --tool=memcheck）非常适合查找内存泄漏。

【解决方案3】：

你能分享你的程序吗？

要寻找的一件事是您是否使用不随元素数量增加而扩展的数据结构。

例如与使用二叉搜索树 (nlog(n)) 或散列 (O(1)) 相比，使用列表来保存一百万个元素的遍历/搜索 (O(n)) 会非常慢。

2。您应该查看是否在每个周期结束时保留数据 (/burn/run)。理想情况下，您应该在每次运行结束时释放所有资源。

3。听起来可能是手柄泄漏？

【讨论】：

我已附上代码。我认为变量的范围是这样设置的，一旦我完成它们就会被释放。

【解决方案4】：

这完全是在黑暗中拍摄的。你有：

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    ...
    copy(directory_iterator(p), directory_iterator(), back_inserter(v));

如果你这样做，性能会如何变化：

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    ...
    // note: preincrementing the iterator
    for (directory_iterator it((p)); it!=directory_iterator(); ++it) {
       v.push_back(*it);
    }

我的想法是 std::copy 被指定为使用后增量。而boost::filesystem::directory_iterator 是一个InputIterator：它不应该真正支持后增量。 boost::filesystem::directory_iterator 可能不高兴被后增量。

【讨论】：