为什么 istream/ostream 慢答案

【问题标题】：Why is istream/ostream slow为什么 istream/ostream 慢
【发布时间】：2013-09-12 09:11:39
【问题描述】：

http://channel9.msdn.com/Events/GoingNative/2013/Writing-Quick-Code-in-Cpp-QuicklyAndrei Alexandrescu 的 50:40 开玩笑说 istream 效率低下/速度慢。

我过去遇到过一个问题，即 ostream 很慢而 fwrite 明显更快（运行一次主循环时减少了很多秒），但我不明白为什么也没有调查过。

是什么让 C++ 中的 istream 和 ostream 变慢？或者至少与同样满足需求的其他东西（如 fread/fget、fwrite）相比慢。

【问题讨论】：

IIRC 如果愿意的话，C++ 流必须与 C i/o“构造”同步（出于兼容性原因）。我相信你可以通过关闭同步来让它们更快（当然你必须限制之后做类似 printf 的事情）
@Borgleader：ostream 会同步到什么 C“构造”（它是文件输出流而不是 std::out），为什么它比 C fwrite 慢？
看看这个答案：stackoverflow.com/a/9371717/583833
@Borgleader：这肯定回答了 cin 的问题。 +1
相关：stackoverflow.com/questions/4340396/…

标签： c++ performance ostream istream

【解决方案1】：

实际上，IOStreams 不必很慢！但是，以合理的方式实施它们以使它们快速运行是一个问题。大多数标准 C++ 库似乎都不太重视实现 IOStreams。很久以前，当我的CXXRT 仍在维护时，它的速度与 stdio 一样快——如果使用得当！

请注意，使用 IOStreams 布局的用户几乎没有性能陷阱。以下指南适用于所有 IOStream 实现，尤其适用于那些为快速而定制的实现：

使用std::cin、std::cout等时需要拨打std::sync_with_stdio(false)！如果没有这个调用，任何使用标准流对象都需要与 C 的标准流同步。当然，在使用std::sync_with_stdio(false)时，假设你不会将std::cin与stdin、std::cout与stdout等混用。
Do not use std::endl 因为它要求对任何缓冲区进行许多不必要的刷新。同样，不要设置std::ios_base::unitbuf 或不必要地使用std::flush。
在创建自己的流缓冲区时（好的，很少有用户这样做），请确保他们确实使用了内部缓冲区！处理单个字符会跳过多个条件和一个virtual 函数，这使得它非常慢。

【讨论】：

@Borgleader：修复！谢谢！
+1 指出这主要是实现的问题，而不是库本身。高效的 iostreams 实施也是 ISO 委员会 2006 年发布的C++ Performance Report 中的主要关注点之一。
@ComicSansMS：碰巧，关于 IOStreams 性能的大部分材料都是基于我的贡献 :-) （贡献不属于他们各自的作者；贡献者列在第 6 页，但是）。

【解决方案2】：

[i]ostream 在设计上很慢有几个原因：

共享格式化状态：每个格式化输出操作都必须检查之前可能被 I/O 操纵器改变的所有格式化状态。由于这个原因，iostream 本质上比 printf 类 API 慢（尤其是像 Rust 或 {fmt} 这样的格式字符串编译，以避免解析开销），其中所有格式信息都是本地的。
不受控制地使用语言环境：所有格式设置都通过低效的语言环境层，即使您不希望这样做，例如在编写 JSON 文件时。见N4412: Shortcomings of iostreams。
低效的代码生成：使用 iostream 格式化消息通常由多个函数调用组成，因为参数和 I/O 操纵器与消息的一部分交错。比如在
中有三个函数调用（godbolt）
```
std::cout << "The answer is " << answer << ".\n";
```
与等效的printf 调用中只有一个 (godbolt) 相比：
```
printf("The answer is %d.\n", answer);
```
额外的缓冲和同步。这可以通过sync_with_stdio(false) 禁用，但代价是与其他 I/O 设施的互操作性较差。

【讨论】：

【解决方案3】：

也许这可以让您了解您正在处理的内容：

#include <stdio.h>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <fstream>
#include <time.h>
#include <string>
#include <algorithm>

unsigned count1(FILE *infile, char c) { 
    int ch;
    unsigned count = 0;

    while (EOF != (ch=getc(infile)))
        if (ch == c)
            ++count;
    return count;
}

unsigned int count2(FILE *infile, char c) { 
    static char buffer[8192];
    int size;
    unsigned int count = 0;

    while (0 < (size = fread(buffer, 1, sizeof(buffer), infile)))
        for (int i=0; i<size; i++)
            if (buffer[i] == c)
                ++count;
    return count;
}

unsigned count3(std::istream &infile, char c) {    
    return std::count(std::istreambuf_iterator<char>(infile), 
                    std::istreambuf_iterator<char>(), c);
}

unsigned count4(std::istream &infile, char c) {    
    return std::count(std::istream_iterator<char>(infile), 
                    std::istream_iterator<char>(), c);
}

unsigned int count5(std::istream &infile, char c) {
    static char buffer[8192];
    unsigned int count = 0;

    while (infile.read(buffer, sizeof(buffer)))
        count += std::count(buffer, buffer+infile.gcount(), c);
    count += std::count(buffer, buffer+infile.gcount(), c);
    return count;
}

unsigned count6(std::istream &infile, char c) {
    unsigned int count = 0;
    char ch;

    while (infile >> ch)
        if (ch == c)
            ++count;
    return count;
}

template <class F, class T>
void timer(F f, T &t, std::string const &title) { 
    unsigned count;
    clock_t start = clock();
    count = f(t, 'N');
    clock_t stop = clock();
    std::cout << std::left << std::setw(30) << title << "\tCount: " << count;
    std::cout << "\tTime: " << double(stop-start)/CLOCKS_PER_SEC << "\n";
}

int main() {
    char const *name = "equivs2.txt";

    FILE *infile=fopen(name, "r");

    timer(count1, infile, "ignore");

    rewind(infile);
    timer(count1, infile, "using getc");

    rewind(infile);
    timer(count2, infile, "using fread");

    fclose(infile);

    std::ifstream in2(name);
    timer(count3, in2, "ignore");

    in2.clear();
    in2.seekg(0);
    timer(count3, in2, "using streambuf iterators");

    in2.clear();
    in2.seekg(0);
    timer(count4, in2, "using stream iterators");

    in2.clear();
    in2.seekg(0);
    timer(count5, in2, "using istream::read");

    in2.clear();
    in2.seekg(0);
    timer(count6, in2, "using operator>>");

    return 0;
}

运行它，我得到这样的结果（使用 MS VC++）：

ignore                          Count: 1300     Time: 0.309
using getc                      Count: 1300     Time: 0.308
using fread                     Count: 1300     Time: 0.028
ignore                          Count: 1300     Time: 0.091
using streambuf iterators       Count: 1300     Time: 0.091
using stream iterators          Count: 1300     Time: 0.613
using istream::read             Count: 1300     Time: 0.028
using operator>>                Count: 1300     Time: 0.619

还有这个（使用 MinGW）：

ignore                          Count: 1300     Time: 0.052
using getc                      Count: 1300     Time: 0.044
using fread                     Count: 1300     Time: 0.036
ignore                          Count: 1300     Time: 0.068
using streambuf iterators       Count: 1300     Time: 0.068
using stream iterators          Count: 1300     Time: 0.131
using istream::read             Count: 1300     Time: 0.037
using operator>>                Count: 1300     Time: 0.121

正如我们在结果中看到的，这并不是 iostream 绝对慢的问题。相反，很大程度上取决于您如何使用 iostream（在较小程度上也取决于FILE *）。在这些实现之间也存在相当大的差异。

尽管如此，最快的版本（fread 和istream::read）基本上是并列的。使用 VC++ 时，getc 比 istream::read 或 istreambuf_iterator 慢很多。

底线：从 iostreams 获得良好的性能需要比使用 FILE * 更加小心——但这当然是可能的。它们还为您提供了更多选择：当您不太关心速度时的便利性，以及与您可以从 C 风格 I/O 中获得的最佳性能直接竞争的性能，并且需要一些额外的工作。

【讨论】：

自从我的edit 被拒绝：你的istream::read-version 有一个错误。最后一块字符没有被检查，see here。
方便。此外，如果您使用“while (infile.get(ch))”将 count6 复制到新的 count7，您会发现它的速度是 operator>> 的两倍，但仍然是 getc 的两倍。
@NickWestgate：是的——不管我添加多少，至少还有三个可以添加。如果（例如）另一种方法比其他方法更快，我可能会添加它——但另一种或多或少处于中间位置的方法似乎不值得打扰......
这对于那些（像我一样）将某些代码的当前状态与其他选项进行比较的人很有用。我很失望 istream::get 在我维护的一些单线程代码中花费大量时间进入和退出关键部分。 ; - ) 无论如何，感谢方便的测试套件。
文件 I/O 在 Windows 和 Linux 上因缓存而固有噪声。

【解决方案4】：

虽然这个问题已经很老了，但我很惊讶没有人提到 iostream 对象构造。

也就是说，每当您创建 STL iostream（和其他流变体）时，如果您单步执行代码，构造函数就会调用内部 Init 函数。在那里，operator new 被调用来创建一个新的locale 对象。同样，一毁即毁。

这太可怕了，恕我直言。并且肯定会导致对象构造/销毁速度变慢，因为在某些时候使用系统锁分配/释放内存。

此外，一些 STL 流允许您指定 allocator，那么为什么创建 locale 时不使用指定的分配器？

在多线程环境中使用流，您还可以想象每次构造新的流对象时调用operator new 所带来的瓶颈。

如果你问我，那真是一团糟，因为我现在正在发现自己！

【讨论】：

Karl Knechtel says here: "(...) 这个任务几乎可以肯定是 I/O 绑定的，关于创建 std 的成本有太多的 FUD： :string 对象在 C++ 中或使用本身。"
Somebody else 具有完全相同的推理......

【解决方案5】：

在一个类似的话题上，STL 说：“你可以调用 setvbuf() 来在标准输出上启用缓冲。”

https://web.archive.org/web/20170329163751/https://connect.microsoft.com/VisualStudio/feedback/details/642876/std-wcout-is-ten-times-slower-than-wprintf-performance-bug-in-c-library

【讨论】：