如何优雅高效地将文件读入向量？答案

【问题标题】：How to read a file into a vector elegantly and efficiently?如何优雅高效地将文件读入向量？
【发布时间】：2017-04-29 14:46:44
【问题描述】：

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

vector<char> f1()
{
    ifstream fin{ "input.txt", ios::binary };
    return
    {
        istreambuf_iterator<char>(fin),
        istreambuf_iterator<char>()
    };
}

vector<char> f2()
{
    vector<char> coll;
    ifstream fin{ "input.txt", ios::binary };
    char buf[1024];
    while (fin.read(buf, sizeof(buf)))
    {
        copy(begin(buf), end(buf),
            back_inserter(coll));
    }

    copy(begin(buf), begin(buf) + fin.gcount(),
        back_inserter(coll));

    return coll;
}

int main()
{
    f1();
    f2();
}

显然，f1() 比f2() 更简洁；所以我更喜欢f1() 而不是f2()。但是，我担心f1() 的效率低于f2()。

所以，我的问题是：

主流 C++ 编译器会优化f1() 使其与f2() 一样快吗？

更新：

我使用了一个 130M 的文件在发布模式下进行测试（Visual Studio 2015 with Clang 3.8）：

f1() 需要1614 毫秒，而f2() 需要616 毫秒。

f2() 比 f1() 快。

多么可悲的结果！

【问题讨论】：

哪个更快？ - 应该测量。想到的一件事是reserve 需要vector 的内存以避免重新分配
另外，可能值得考虑使用绳索，与输入库的选择没有直接关系，但无论如何：stackoverflow.com/questions/2826431/…

标签： c++ performance io compiler-optimization idioms

【解决方案1】：

我已使用 mingw482 检查了您的代码。出于好奇，我添加了一个附加函数f3，实现如下：

inline vector<char> f3()
{
    ifstream fin{ filepath, ios::binary };
    fin.seekg (0, fin.end);
    size_t len = fin.tellg();
    fin.seekg (0, fin.beg);

    vector<char> coll(len);
    fin.read(coll.data(), len);
    return coll;
}

我已经使用文件~90M long 进行了测试。对于我的平台，结果与您的有所不同。

f1() ~850ms
f2() ~600ms
f3() ~70ms

结果计算为 10 次连续文件读取的平均值。

f3 函数花费的时间最少，因为在vector<char> coll(len); 它已经分配了所有需要的内存，不需要进行进一步的重新分配。至于back_inserter，它要求类型具有push_back 成员函数。当超过 capacity 时，哪个 for 向量会重新分配。如文档中所述：

push_back

这实际上将容器大小增加了一倍，从而导致当且仅当-时自动重新分配分配的存储空间新的向量大小超过了当前向量的容量。

在f1 和f2 实现中，后者稍快一些，尽管两者都使用back_inserter。 f2 可能更快，因为它以块的形式读取文件，从而允许进行一些缓冲。

【讨论】：

我使用这种方法的观察是，是的，coll 的内存已更新，但向量容器不知道任何更改。如果您询问该向量的大小，它将报告为零。

【解决方案2】：

如果小于几 GB，您可以一次读取所有内容：

#include "sys/stat.h"
        ....

char* buf;
FILE* fin;
filename="myfile.cgt";
#ifdef WIN32
   struct stat st;
  if (stat(filename, &st) == -1) return 0;
#else
    struct _stat st;
if (_stat(filename, &st) == -1) return 0;
#endif
    fin = fopen(filename, "rb");
    if (!fin) return 0;
    buf = (char*)malloc(st.st_size);
    if (!buf) {fclose(fin); return 0;}
    fread(buf, st.st_size, 1, fin);
    fclose(fin);

不用说你应该在 C++ 中使用“new”而不是 malloc()

【讨论】：