C / C ++中的读取字节方法答案

【问题标题】：Read bytes methods in C / C++C / C ++中的读取字节方法
【发布时间】：2011-09-16 12:10:57
【问题描述】：

我是 C 新手，我想知道是否有标准库方法来读取字节/int/long 如：getChar()、getInt()、getLong()。

例如，如果我调用 getInt()，它会将 4 个字节作为字符串返回，并将 char 指针地址移动 4。我在哪里可以找到这些方法？

【问题讨论】：

目前还不清楚您要做什么。 getInt() 来自哪里以及作为什么？返回 char[4] 的 getInt() 对我来说没有多大意义。

标签： c++ c

【解决方案1】：

不，库不直接系统地支持二进制（反）序列化。 read() 函数将移动流指针，但我认为您无法绕过依赖于平台的代码来解释字节流：

std::infile thefile("data.bin", "rb");

float f;
double d;
uint32_t i;

// the following is OK and doesn't constitute type punning
char * const pf = reinterpret_cast<char*>(&f);
char * const pd = reinterpret_cast<char*>(&d);
char * const pi = reinterpret_cast<char*>(&i);

// the following may or may not give you what you expect
// Caveat emptor, and add your own platform-specific code here.
thefile.read(pf, sizeof(float));
thefile.read(pd, sizeof(double));
thefile.read(pi, sizeof(uint32_t));

在仅读取无符号整数值的情况下，您可以执行代数提取，这在某种意义上是类型安全的，并且只需要您知道序列化数据格式的字节序：

unsigned char buf[sizeof(uint32_t)];
thefile.read(reinterpret_cast<char*>(buf), sizeof(uint32_t));

uint32_t n = buf[0] + (buf[1] << 8) + (buf[2] << 16) + (buf[3] << 24); // little-endian

读取二进制浮点数据尤其令人讨厌，因为您必须了解有关数据流的大量额外信息：它是否使用 IEEE754？（你的平台有吗？）什么是字节序（浮点字节序独立于整数字节序）？还是它完全被表示为其他东西？文件格式的良好文档至关重要。

在 C 中，您将使用 fread() 和 C 风格的转换，char * const pf = (char*)(&f)。

【讨论】：

【解决方案2】：

由于指针算术是 C 的本质，所以那里没有类似 Java 的函数。

要从一些内存缓冲区中获取int，您可以这样做：

/* assuming that buf is of type void * */
int x = *((int *) buf);
/* advance to the position after the end of the int */
((int *) buf)++;

或更简洁：

int x = *((int *) buf)++;

【讨论】：

*((int *) buf) 不是 UB - 它只是将 void 指针转换为类型化指针以取消引用它。当然，您需要确保buf 的大小足够。
将指针转换为int* 不是未定义的行为，但通过结果指针访问内存是。此响应完全是错误的，除非在非常特殊的情况下，否则不起作用。
-1 在转换后取消引用指针是未定义的行为，并且在转换结果上使用 ++ 甚至不应该编译。
@Blagovest 这不是“一些编译器不会认为 (int*)buf 是一个有效的左值——而是 C 和 C++ 标准都说它是不是一个左值，并且将++应用于它需要诊断。类似地，如果一个对象没有被声明和初始化为int，通过int类型的左值表达式访问它是未定义的行为：在这个特定的在这种情况下，例如，指针可能没有充分对齐，或者当被视为 int 时，字节可能包含陷阱值。
它应该是int x = *(*(int **)&buf)++;，或者，the help of LVALUE_CAST，int x = *LVALUE_CAST(int *,buf)++;，参见

【解决方案3】：

我相信您指的是 Java 的 ByteBuffer 方法。

请注意，如果您对由这些函数处理的相同数据进行操作，则 Java 始终是 BIG endian，而与主机的本机字节顺序无关。除非您确定它不是，否则您的 C 代码可能正在编译以在 LITTLE endian 机器上运行。如果您不确定，请提供一些粗略的指导：x86（大多数 PC）是 LE。 ARM 可以是任何一种，但通常是 LE。 PowerPC 和 Itanium 是 BE。

此外，除非您知道它已正确对齐，否则切勿将 char * 或 void * 取消引用到任何大于 1 字节的类型。如果不是，它将导致总线故障或类似错误。

所以这将是我的getInt() impl，假设一个 BE/网络字节顺序（例如由 Java 生成）缓冲区。很抱歉，我很抱歉。

typedef struct ByteBuffer {
    const char * buffer;   /* Buffer base pointer */
    int          nextByte; /* Next byte to parse */
    int          size;     /* Size of buffer */
} ByteBuffer_t;

/* Get int from byte buffer, store results in 'i'. Return 0 on success, -1 on error */
int getInt(ByteBuffer * bb, int * i) {
   const char * b;
   if( (bb->nextByte + 3) < bb->size ) {
      b = &(bb->buffer[bb->nextByte]);
      /* Read as big-endian value */
      *i = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[0];
      bb->nextByte += 4;
      return 0;
   } else {
      return -1;
   }
}


void test(const char * buf, int bufSize) {
   ByteBuffer_t bb;
   int ival;

   bb.buffer = buf;
   bb.size   = bufSize;
   bb.nextByte = 0;

   while(1) {
      if( 0 == getInt(&bb, &ival) )
          printf("%d\n", ival);
      else
          break;     
   }
}

编辑：删除 ntohl() 调用....如果您的源数据真的是大端，它不属于。如果它在那里与那个调用一起工作，你可能需要交换 shift-pack 上的字节顺序，这意味着它将改为解析 little-endian 字节流。

【讨论】：

太棒了！正是我需要的

【解决方案4】：

有一个 getchar() 函数。

c中的标准输入法是使用

scanf("<format specifer string>",input param1, param2,...)

看看http://www.cplusplus.com/reference/clibrary/cstdio/scanf/

【讨论】：

这不是 OP 想要的。