在 C++ 中读取二进制文件的正确方法？答案

【问题标题】：Proper way to read binary file in C++?在 C++ 中读取二进制文件的正确方法？
【发布时间】：2013-11-03 22:14:41
【问题描述】：

我一直在互联网上寻找一种在 c++ 中读取二进制文件的方法，我发现了两个可以工作的 sn-ps：

第一名：

#include <iostream>
#include <fstream>

int main(int argc, const char *argv[])
{
   if (argc < 2) {
      ::std::cerr << "Usage: " << argv[0] << "<filename>\n";
      return 1;
   }
   ::std::ifstream in(argv[1], ::std::ios::binary);
   while (in) {
      char c;
      in.get(c);
      if (in) {
         // ::std::cout << "Read a " << int(c) << "\n";
         printf("%X ", c);
      }
   }
   return 0;
}

结果：

6C 1B 1 FFFFFFDC F FFFFFFE7 F 6B 1

2号：

#include <stdio.h>
#include <iostream>

using namespace std;

// An unsigned char can store 1 Bytes (8bits) of data (0-255)
typedef unsigned char BYTE;

// Get the size of a file
long getFileSize(FILE *file)
{
    long lCurPos, lEndPos;
    lCurPos = ftell(file);
    fseek(file, 0, 2);
    lEndPos = ftell(file);
    fseek(file, lCurPos, 0);
    return lEndPos;
}

int main()
{
    const char *filePath = "/tmp/test.bed";
    BYTE *fileBuf;          // Pointer to our buffered data
    FILE *file = NULL;      // File pointer

    // Open the file in binary mode using the "rb" format string
    // This also checks if the file exists and/or can be opened for reading correctly
    if ((file = fopen(filePath, "rb")) == NULL)
        cout << "Could not open specified file" << endl;
    else
        cout << "File opened successfully" << endl;

    // Get the size of the file in bytes
    long fileSize = getFileSize(file);

    // Allocate space in the buffer for the whole file
    fileBuf = new BYTE[fileSize];

    // Read the file in to the buffer
    fread(fileBuf, fileSize, 1, file);

    // Now that we have the entire file buffered, we can take a look at some binary infomation
    // Lets take a look in hexadecimal
    for (int i = 0; i < 100; i++)
        printf("%X ", fileBuf[i]);

    cin.get();
    delete[]fileBuf;
        fclose(file);   // Almost forgot this
    return 0;
}

结果：

6C 1B 1 DC F E7 F 6B 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A1 D 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

xxd /tmp/test.bed的结果：

0000000: 6c1b 01dc 0fe7 0f6b 01                   l......k.

ls -l /tmp/test.bed的结果

-rw-rw-r-- 1 user user 9 Nov  3 16:37 test.bed

第二种方法是在开始时给出正确的十六进制代码，但似乎文件大小错误，第一种方法是弄乱字节。

这些方法看起来很不一样，也许在c++中有很多方法可以做同样的事情？有没有专业人士采用的成语？

【问题讨论】：

这里有更多关于这个问题的解释：stackoverflow.com/questions/22054759/…

标签： c++ io

【解决方案1】：

您当然希望将char 对象转换为unsigned char，然后再将它们处理为整数值！问题是char 可能会被签名，在这种情况下，当你转换它们时，负值会转换为负值ints。显示为十六进制的负数 ints 将有两个以上的十六进制数字，前导的可能都是“f”。

我没有立即发现为什么第二种方法会导致尺寸错误。但是，读取二进制文件的 C++ 方法很简单：

#include <iostream>
#include <fstream>
#include <vector>
#include <iomanip>

std::vector<unsigned char> bytes;
{
    std::ifstream in(name, std::ios_base::binary);
    bytes.assign(std::istreambuf_iterator<char>(in >> std::noskipws),
                 std::istreambuf_iterator<char>());
}
std::cout << std::hex << std::setfill('0');
for (int v: bytes) {
    std::cout << std::setw(2) << v << ' ';
}

【讨论】：

c++ 使用按位移位运算符来指定选项有点令人困惑。
我发现该代码非常冗长。而且，它会永久修改std::cout 的输出格式。如果恢复格式，代码会是什么样子？
貌似ios_base是ios的子类，ios::binary和ios_base::binary有什么区别？或者，也许我弄错了，ios 是ios_base 的子类，因此继承了binary？
@RolandIllig：鉴于整个二进制文件的输出非常强大，我认为像std::ostream fmt(0); fmt.copyfmt(std::cout); ...; std::cout.copyfmt(fmt); 这样的东西可能是合理的。由于格式化标志是本地设施，我认为没有必要恢复它们（stdio 甚至没有粘性格式化标志的概念）。
@RolandIllig：如前所述：是的，设置的格式化标志将保持原样。但是，如果您关心事物的格式，您最好在本地设置格式标志以满足特定需求：已经设置的格式标志是一些随机组合，无论它们上次设置的位置都是有用的。格式化标志具有粘性的事实是格式化标志在 IOStreams 中实现方式的副产品，并不打算在全局范围内设置标志并使用它们！您可以根据本地需要设置格式标志。

【解决方案2】：

您的两种方法都是 C 和 C++ 的某种奇怪组合（嗯，实际上第二种方法只是普通的 C）；仍然，第一种方法大部分是正确的，但你必须使用 unsigned char 来代替 c，否则任何超过 0x7f 的字节都被读取为负数，这会导致错误的输出。¹

要以“C++ 方式”正确和做事，你应该这样做：

std::cout<<std::hex<<std::setfill('0');

...

   if (in)
      std::cout << std::setw(2)<<int(c) << "\n";

第二个得到正确的“签名”，但它大多只是 C。快速修复是修复 for 循环中的 100，将其替换为 fileSize。但总的来说，将整个文件加载到内存中只是为了以十六进制转储其内容是一个拙劣的想法。您通常所做的是在固定大小的缓冲区中一次读取文件，然后按 by 进行转换。

get 返回一个 int；如果它大于0x7f，它在分配时会溢出char，通常会导致一些负值。然后，当它被传递给printf 时，它会进行符号扩展（因为传递给可变参数函数的任何有符号整数参数都被扩展为int）但由于%X 参数而被解释为unsigned int。（所有这些假设 2 的补码算术，非信号整数溢出和有符号 char）

【讨论】：

虽然您当然可以使用setf() 为字段设置各种标志，但我认为使用相应的操纵器会很多更容易，例如std::cout << std::hex; .使用带有函数调用符号 std::cout.operator<< (std::hex) 的操纵器几乎比使用 setf() 更容易。
@DietmarKühl：我倾向于避免使用流操纵器，我从来没有想过它们中的哪些是“粘性的”，而且我已经被这些东西咬过好几次了。现在我查了一下，最后it seems that all of them are sticky，但width 被随机重置，去看看。这就是为什么如果可能的话，我会尝试完全避开标准流，它们的设计充满了缺陷，尤其是“执行不力的雄心勃勃的想法”类别。
（#1 sin：希望带有操纵器的<< 语法可以很好地替代 printf 样式的“格式字符串”格式；提示：不是）

【解决方案3】：

在第一种情况下，您正在打印 char（已签名），而在第二种情况下，您正在使用 unsigned char 执行相同的操作。 %X 将字符扩展为整数，这会导致差异。

【讨论】：

【解决方案4】：

在搜索@Roland Illig 的答案（现已删除）为什么不起作用时，我找到了以下解决方案，不确定它是否符合专业标准，但到目前为止它给出了正确的结果，并允许检查文件的开头 n 字节：

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>


int main(int argc, const char *argv[])
{
    if (argc < 3) {
        ::std::cerr << "usage: " << argv[0] << " <filename>\n";
        return 1;
    }

    int nbytes = std::stoi(argv[2]);
    char buffer[nbytes];
    std::streamsize size = nbytes;

    std::ifstream readingFile(argv[1], std::ios::binary);
    readingFile.read(buffer, (int)size);
    std::streamsize bytesread = readingFile.gcount();
    unsigned char rawchar;
    if (bytesread > 0) {
        for (int i = 0; i < bytesread; i++) {
            rawchar = (unsigned char) buffer[i];
            printf("%02x ", (int) rawchar);
        }
        printf("\n");
    }

    return 0;
}

我从 wibit.com 得到的另一个答案：

#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, const char* argv[])
{
  ifstream inBinaryFile;
  inBinaryFile.open(argv[1], ios_base::binary);
  int currentByte = inBinaryFile.get();
  while(currentByte >= 0)
  {
    printf("%02x ", currentByte);
    currentByte = inBinaryFile.get();
  }
  printf("\n");
  inBinaryFile.close();
  return 0;
}

【讨论】：