【问题标题】:Add double quotes to the first line of a csv file via command line通过命令行将双引号添加到 csv 文件的第一行
【发布时间】:2019-10-05 11:16:17
【问题描述】:

我有这个 csv 文件,我注意到在导出过程中没有添加起始报价。实际上在 ubuntu 中,如果我输入:

head -n 1 file.csv

我得到这个输出:

801","40116","Hazelnut MT -L","Thursday Promo","Large","","5.9000","","801","1.0000","","3.6500","2.2500",".0000","default","","","","","Chatime","02/06/2014","09125a9cfffd4143a00e73e3b62f15f2","CB01","",".0000","5.9000","6.9000",".0000",".0000",".0000",".0000",".0000",".0000","0","","0","0","0","","","","","","","","","Modern Milk Tea","","","0","","","1","0","","","","","","","","0","Hau Chan","","","","","","","","","","0","","","","","","","-1","","","","","","","","","","","","0","00000000420714AA","2014-06-02","1900-01-01","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

是否有一些命令类型可以帮助我添加缺少的起始引号?

【问题讨论】:

  • 有,但您也可以在文本编辑器中打开文件并手动添加报价。
  • 这是一个 70gb 的 csv 文件。我想我无法打开它。
  • 哦,好吧……这改变了一切。
  • 只是为了确定:引号是仅在第一行中丢失还是在每一行中都丢失了?就像 oguz ismail 我假设它只在第一行中丢失,因为您只显示了那一行。
  • @Socowi 仅在第一行。接下来是正确的引号。

标签: regex bash shell command-line


【解决方案1】:

这应该适用于每个 posix-shell:

printf \" | cat - file.csv > repaired-file.csv

如果你对结果满意,你可以覆盖原来的

mv repaired-file.csv file.csv

由于您的文件有 70GB 大,您可能希望避免创建第二个文件,但这比看起来要难。当然,有诸如sed 的就地选项 (-i) 和来自moreutilssponge 实用程序之类的东西,但它们并不像您预期​​的那样就地工作。 sed -isponge 都使用临时文件或将整个文件保存在内存中(不再适用于 70GB)。可以在this blog post 中找到有关真正就地编辑的精彩研究。结论:没有标准工具支持真正的就地编辑。但是下面的perl 单行应该可以工作(已经适应了你的需要)。

perl <<'EOF'
  use Tie::File;
  my @a;
  tie @a, 'Tie::File', 'path/to/your/file' or die 'Cannot tie file';
  $a[0] = '"' . $a[0];
EOF

基准测试

出于兴趣,我运行了此处讨论的命令并测量了它们的运行时间。

9.3 GiB 输入文件f 是使用seq 1000000000 &gt; f 生成的。在为单个命令计时之前,我总是重新生成f 并使用sync &amp;&amp; echo 3 | sudo tee /proc/sys/vm/drop_caches 清空系统缓存。我的系统有足够的内存来保存整个文件,但我手动监控了内存使用情况——所有命令只使用了几 KB 的内存。

  • printf \" | cat - f &gt; f2; mv f2 f   1m 05s
  • perl … # script from above         1m 32s
  • sed -i '1s/^/"/' f            25m 57s(也一直使用 100% CPU)

cat 命令比perl 脚本快,我自己也有点惊讶。但是,这是有道理的,因为 perl 脚本会进行大量搜索(可以使用 strace 看到),而 cat 只是复制。

总结:如果您有足够的磁盘空间,请使用cat 命令。如果文件大于系统上剩余的可用磁盘空间,则使用perl 脚本。

【讨论】:

    猜你喜欢
    • 2018-12-25
    • 1970-01-01
    • 2011-03-10
    • 1970-01-01
    • 1970-01-01
    • 2014-10-03
    • 2021-10-25
    • 2021-07-08
    • 2014-09-28
    相关资源
    最近更新 更多