为什么 binary.Read() 不能正确读取整数？答案

【问题标题】：Why isn't binary.Read() reading integers correctly?为什么 binary.Read() 不能正确读取整数？
【发布时间】：2015-01-02 10:20:47
【问题描述】：

我正在尝试在 Go 中读取二进制文件。

基本上我有一个这样的结构：

type foo struct {
    A int16
    B int32
    C [32]byte
    // and so on...
}

我正在从文件中读取这样的结构：

fi, err := os.Open(fname)
// error checking, defer close, etc.
var bar foo
binary.Read(fi, binary.LittleEndian, &bar)

现在，应该工作，但我得到了一些奇怪的结果。例如，当我读入结构时，我应该得到这个：

A: 7
B: 8105
C: // some string

但我得到的是：

A: 7
B: 531169280
C: // some correct string

这是因为binary.Read()在读取文件时，在读取[]byte{7, 0}为int16(7)（A的正确值）后，遇到了切片[]byte{0, 0, 169, 31}并试图将其转换为int32。但是，binary.Read() 的转换是这样的：

uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 其中b 是字节片。

但真正让我感到困惑的是，在 C 中做完全相同的事情却完美无缺。

如果我用 C 写这个：

int main()
{
    int fd;
    struct cool_struct {
        short int A;
        int32_t B;
        char C[32];
        // you get the picture...
    } foo;
    int sz = sizeof(struct cool_struct);
    const char* file_name = "/path/to/my/file"

    fd = open(file_name, O_RDONLY);
    // more code
    read(fd, &foo, sz);
    // print values
}

我得到了正确的结果。为什么我的 C 代码正确，而我的 Go 代码却没有？

【问题讨论】：

我记得某些版本的 C 可以以中端形式编写 32 位整数。您可以发布文件开头的十六进制转储吗？
@Jasen 使用 xxd：0000000: 0700 0000 a91f 0000 7074 732f 3300 0000 ........pts/3...
改变一切，你的 int16 是 4 个字节长 :)
@Jasen 但是在 Go 中 int16 是 2 个字节长。例如：play.golang.org/p/u1_bBJSeSo
@eric_lagergren 它也在 C 中，实际发生的是 C 编译器在 int16_t 之后添加了一些填充，以便 int32_t 可以 4 字节对齐

标签： c struct go

【解决方案1】：

假设字符串的前两个字符不是 '\000'

你有一个对齐问题，你的 C 编译器在 int16 之后添加了两个额外的填充字节，Go 不是

最简单的解决方法可能只是在“A”之后添加一个虚拟（填充）int16

type foo struct 
{
    A int16
    A_pad int16
    B int32
    C [32]byte
}

或者这可能是一种告诉 go int32 需要“4 字节对齐”的方法

如果您知道，请编辑此答案或发表评论

【讨论】：

是的，填充有效。 Go 允许您使用 _ 进行填充......当我第一次阅读文档时，不知何故没有注册。可惜他们没有像 C 那样对齐。感谢您的帮助。
@eric_lagergren 实际上，对齐就在那里（使用unsafe.Offsetof 来验证）。只是binary.Read() 要求您明确指定对齐方式，以便您的代码可移植。

【解决方案2】：

given:

0000000: 0700 0000 a91f 0000 7074 732f 3300 0000 ........pts/3...

the fields, per the struct, are:
0700h that will be the short int field, little endian format =  7

0000a91fh that will be the  int field, little endian format = the big number
...

your struct needs a second short field to absorb the 0000h
then 
0700h = 7
0000h = 0 in new field
a91f0000 = 8105
....

which indicates (amongst other things) that the struct is missing 
the expected 2 byte padding between the short and the int fields
does the C code have #pragma pack?

【讨论】：