【问题标题】:Unix : Find and replace consecutive commas to consecutive pipelinesUnix:查找连续的逗号并将其替换为连续的管道
【发布时间】:2021-04-14 05:14:10
【问题描述】:

我在 Unix 中将双引号 CSV 转换为管道分隔的 txt 文件。 我已使用以下 sed 命令将“,”替换为 |然后删除开始和结束双引号。

sed -e 's/","/|/g' -e 's/"//g' filenm.csv > filenm.txt

但文件似乎有连续的逗号,没有双引号,并且没有被替换。

Col1|col2|col3|col4|col5|col6|col7|col8
Val1|val2|val3,,,,val7|val8

现在我想将所有这些连续的逗号转换为连续的管道,因为它们表示空字段。

并且其他字段在字段值内也有不应更改的逗号。

我尝试使用下面的方法,但不起作用。

sed -e 's/,{1,\}/|{1,\}/g' filenm.csv > filenm.txt

在记事本中打开的示例 csv 文件:

"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"

我希望这有助于重现问题并解决。

提前谢谢....

【问题讨论】:

  • 可以发一下原文件吗?
  • 当然,我已经添加了示例文件。希望这可以帮助!谢谢
  • @WiktorStribiżew perl 命令完美运行,但有一个小问题 - 当字段中有 0 时,它会跳过该字段,并且字段内容会移动一个字段......即当文件看起来像这样时: "ID","Name","DOB","Age","Address","City","State","Country","Phone number" "123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999" 您提供的 perl cmd 提供了以下结果:ID|Name|DOB|Age|Address|City|State|Country|Phone number 123|ABC|12/20/2020|||No.38,3rd st, RRR NNN, TRT||||9999999999 请将此示例导入​​ Excel 以获取良好参考。
  • 是的,我只是在尝试和分析问题......
  • 没有我放的是Unix输出

标签: regex linux unix awk sed


【解决方案1】:

这可能对你有用(GNU sed):

sed -E ':a;s/^(("[^",]*",+)*"[^",]*),/\1\n/;ta;y/,\n/|,/' file

" 之间的, 迭代替换为换行符,然后将, 转换为| 并将换行符转换为,

【讨论】:

    【解决方案2】:

    你可以使用perl:

    perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' filenm.csv > filenm.txt
    

    详情

    • "([^"]*)"|, - 匹配 " 的正则表达式模式,然后将除 " 之外的任何零个或多个字符捕获到第 1 组,然后匹配 ",或者在所有其他上下文中仅匹配 ,
    • defined($1) ? $1 : "|" - RHS,替换,用第 1 组值(如果匹配第 1 组)或|(如果匹配,)替换匹配项
    • ge - g 代表 global(替换所有出现),e 使 Perl 将 RHS 视为 Perl 表达式。

    查看online test

    #!/bin/bash
    s='"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
    "123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"'
    perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' <<< "$s"
    

    输出:

    ID|Name|DOB|Age|Address|City|State|Country|Phone number
    123|ABC|12/20/2020|0|No.38,3rd st, RRR NNN, TRT||||9999999999
    

    【讨论】:

      【解决方案3】:

      使用 awk:

      awk -F \" '{ for(i=1;i<=NF;i++) { if ($i ~ /^[,]{2,}$/) { $i="," } } OFS="\"";gsub("\",\"","\"|\"",$0)}1' sample.csv
      

      解释:

      awk -F \" '{  # Set the field delimiter to double quote
                   for(i=1;i<=NF;i++) { 
                     if ($i ~ /^[,]{2,}$/) { 
                        $i="," # Loop through each field and if is contains 2 or more commas, set that field to one comma
                     } 
                   } 
                   OFS="\"";
                   gsub("\",\"","\"|\"",$0) # Substitute "," for "|"
                 }1' sample.csv
      

      【讨论】:

        【解决方案4】:

        我会使用 GNU AWK 来实现以下方式。让file.txt内容成为

        "ID","Name","DOB","Age","Address","City","State","Country","Phone number"
        "123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
        "456","DEF","12/20/2020",,,,,"test-country","9999999999"
        "465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
        

        然后

        awk 'BEGIN{FS="\"";OFS=""}{for(i=1;i<=NF;i+=2){$i=gensub(/,/,"|","g",$i)};print $0}' file.txt
        

        输出

        ID|Name|DOB|Age|Address|City|State|Country|Phone number
        123|ABC|12/20/2020|15|No.38,3rd st, RRR NNN, TRT||||9999999999
        456|DEF|12/20/2020|||||test-country|9999999999
        465|XYZ|||No.38,3rd st, RRR NNN, TRT||||9999999999
        

        我假设第一列和最后一列永远不会为空。我使用" 作为字段分隔符,然后在每个奇数字段中(这些字段仅包含,)我将所有, 更改为|。最后,我打印了整个这样的修改行。

        (在 GNU Awk 5.0.1 中测试)

        【讨论】:

          猜你喜欢
          • 2014-05-31
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2015-11-15
          • 1970-01-01
          相关资源
          最近更新 更多