【问题标题】:std::getline partially reads first and last line and sets eof-bitstd::getline 部分读取第一行和最后一行以及位集
【发布时间】:2015-07-18 01:09:22
【问题描述】:

我需要用 C++ 读取 csv 文件:文件的第一行包含所有列标题,其余行包含浮点数据(以下示例,文件已被缩小)。

一些文件有问题,我正在使用以下代码

#include <iostream>
#include <fstream>
#include <string>

// Compiled and testen on with Clang++ on Ubuntu 14.04
int main(int argc, char** argv) {
    std::ifstream in;
    in.open(argv[1]);

    if(!in.is_open()) {
        std::cerr << "Cannot open file: " << argv[1] << "\n";
        return 1;
    }

    std::string buff;
    std::getline(in, buff);
    while(!in.eof()) {
        std::cout << buff << "\n";
        getline(in, buff);
    }

    in.close();
    return 0;
}

对于大多数文件,这运行正常,每次迭代读取一行; “好”文件的示例:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,AU17,AU26,Forward,Backward
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0
0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.667,0.0
58.3,50.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
62.4,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0

有些文件会发疯,并在第一个 getline 之后设置 eof 位。在第一次读取之后,buff 包含第一行的一部分和最后一行的一部分; “坏”文件的示例:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,Au17,AU57,AU58
0,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0.3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.3,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
57.9,66.667,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

buff的内容在一次调用getline之后:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RA60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

如您所见,第一行与最后一行混合在一起。我不知道出了什么问题。每行以\n 结尾,文件以空\n 结尾。

我想我的问题是:为什么getline 会跳到文件结尾,同时将某些文件的第一行和最后一行混合在一起,而其他文件却可以正常工作?

编辑:我需要将一个大数据集转换为一种新的、更一致的格式。当前格式充满了不一致(使用00.0AU17Au17)。不过,这些格式问题应该不会影响简单地读取文件,对吧?

编辑2:

cat -v -e -t 在一个好文件上:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,AU05,AU17,AU26,Forward,Backward^M$
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,66.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0^M$
0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0^M$
etc...

cat -v -e -t 在错误文件上:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,Au17,AU57,AU58^M0,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M0.3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.3,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.4,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.8,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0,0,0,0,25,0^M2.8,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M31,0,0,0,0,33.333,0,0,0,0,25,0,0,0,0,0,0,0,0,0,0^M31.1,0,0,0,0,50,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.2,0,0,0,0,66.667,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.4,0,0,33.333,0,66.667,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.5,0,0,33.333,0,66.667,0,0,0,0,50,25,0,0,0,0,0,0,0,0,0^M32,0,0,33.333,0,66.667,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25^M32.1,0,0,33.333,0,83.333,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25^M32.2,0,0,33.333,0,83.333,0,0,0,0,25,25,0,0,0,0,0,0,0,0,25^M32.4,0,0,33.333,0,83.333,0,0,0,0,25,0,0,0,0,0,0,0,0,0,25^M32.7,0,0,33.333,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25^M33,0,0,33.333,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M33.5,0,0,0,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M33.9,0,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M55,33.333,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M55.2,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M55.8,100,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M56.8,100,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,25^M57.4,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,25^M57.8,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M57.9,66.667,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

看起来差别很大,我该如何解决?

【问题讨论】:

  • 与您的问题无关,但不是两个 std::getline 调用,为什么不简单地依赖 std::getline (就像几乎所有其他流函数一样)返回实际的流引用,并执行例如while (std::getline(...))?
  • 至于你的问题,你确定文件没有其他问题吗?您只检查文件结尾而不是循环中的其他错误。错误文件中没有隐藏(不可打印)字符?相同的文件总是有问题吗?
  • @JoachimPileborg 在这段代码的 sn-p 中出现这种情况的原因是我想快速检查不同的位:eof、fail、bad。但是在我的应用程序代码中,我正在按照您的建议进行操作。 编辑: 它永远不会到达循环内部,立即设置 eof-bit;检查失败位会导致单次迭代。是的,它总是在同一个文件上,这让人非常好奇!我检查了 \r\c 字符,但文件中没有。
  • cat -v -e -t &lt;your_file&gt; 的输出是什么?我怀疑@Joachim Pileborg 是正确的。
  • 您在 Windows 上吗?因为如果文本中有 CTRL-Z(ASCII 0x1a),它也将充当文件结尾。您是否在十六进制编辑器中检查过文件?

标签: c++ csv eof getline


【解决方案1】:

文件似乎缺少换行符,而只有回车符(等于 ^MCTRLM)。

您可以通过对文件使用cat 来修复它,并通过管道传递到tr 以将回车转换为换行符:

$ cat your-file | tr '\r' '\n' > your-file-fixed

在看到您对来自 Max OS 的文件的评论后,我认为这是旧的 pre-OSX 版本,当时 Mac OS 上的换行符只是一个回车。

【讨论】:

  • 根据 Sublime Text,它们是在 Mac OS 9 上制作的。再次感谢您的帮助!
猜你喜欢
  • 1970-01-01
  • 2021-12-05
  • 1970-01-01
  • 2015-06-24
  • 2015-06-05
  • 1970-01-01
  • 2017-03-26
  • 2020-04-05
  • 1970-01-01
相关资源
最近更新 更多