为什么 read() 返回不正确的字节数？ [复制]答案

【问题标题】：Why is the read() returning an incorrect byte count? [duplicate]为什么 read() 返回不正确的字节数？ [复制]
【发布时间】：2018-07-12 15:43:46
【问题描述】：

假设应用程序read-data 的以下代码只是将数据从stdin 读取到分配在堆上的缓冲区buf：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

const size_t BUF_SIZE=1048576*256; // Just for testing, don't do this in prod code
const size_t MAX_READ_SIZE=1048576;

int main(int argc, char *argv[])
{
    // Allocate buffer space on the heap
    char *buf=(char *) malloc(BUF_SIZE);

    // Check for malloc failure
    if (buf==NULL)
    {
        fprintf(stderr,"Unable to allocate %zu bytes\n");
        return 1;
    }

    size_t curOffset=0;

    // Read MAX_READ_SIZE (or smaller) blocks until EOF
    // WARNING: Don't do this in actual "live" code, since it can result
    // in a buffer overflow with an input whose size
    // exceeds that of the pre-allocated buffer
    while (ssize_t numRead=read(STDIN_FILENO,buf+curOffset,MAX_READ_SIZE)>0)
    {
        fprintf(stderr,"Bytes read: %zu\n",numRead);
        curOffset+=numRead;
    }

    // Free buffer space
    free(buf);
    fprintf(stderr,"Total bytes read: %zu\n",curOffset);
}

测试：

$ cat | ./read-data
a
Bytes read: 1
b
Bytes read: 1
c
Bytes read: 1
d
Bytes read: 1
Total bytes read: 4

所有换行符和它们的“字节”都去哪儿了？每个输入应该读取两个字节，总共 8 个字节。

例如比较：

使用基本的 Unix 工具进行测试：

$ cat | printf 'Total bytes read: %u\n' "$(wc --bytes)"
a
b
c
d
Total bytes read: 8

更奇怪的是，给定一个文件four-lines.txt，我的行为更加疯狂：

$ cat four-lines.txt
a
b
c
d
$ wc --bytes four-lines.txt
8 four-lines.txt
$ <four-lines.txt ./read-data
Bytes read: 1
Total bytes read: 1

错误一定很明显，但我只能说：WTF？

更新：正如 Andrew 指出的那样，该错误是错误假设该行中的运算符优先级的问题：

    while (ssize_t numRead=read(STDIN_FILENO,buf+curOffset,MAX_READ_SIZE)>0)

有没有办法改变行，以便定义可以放在while 条件内，还是有必要在while 之前定义 numRead？

更新 2：修复很明显，感谢 WhozCraig 您的回答，将变量定义范围限定为循环体：

for (ssize_t numRead=0;
     (numRead=read(STDIN_FILENO,buf+curOffset,MAX_READ_SIZE))>0;
    )
...

【问题讨论】：

这个问题证明了为什么在条件子句中进行赋值是一个非常糟糕的主意......
@Andrew，让我猜猜，优先级？
提示：numRead 在当前发布的代码中只会是 0 或 1。
for (ssize_t numRead; (numRead = read(STDIN_FILENO, buf + curOffset, MAX_READ_SIZE))>0);)，作为奖励，它将在 C 编译器上编译
“错误”是 C 语言中根本没有 while (ssize_t numRead=read(...)) 这样的语法。然而你将你的问题标记为[C]。为什么它被标记为 [C]，你是如何编译它的？

标签： c linux gcc

【解决方案1】：

你没有把任务缩小到足够的范围：

while (ssize_t numRead=read(STDIN_FILENO,buf+curOffset,MAX_READ_SIZE)>0)

将比较结果分配给numRead，即0或1。

你要赋值的是read的结果：

ssize_t numRead;
while ((numRead=read(STDIN_FILENO,buf+curOffset,MAX_READ_SIZE)) > 0)

【讨论】：