【问题标题】:How can i convert this format string to CSV?如何将此格式字符串转换为 CSV?
【发布时间】:2020-03-26 17:45:33
【问题描述】:

我的字符串是:

AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE

AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE

....

我需要使用标准 linux 命令(如 awk 或 ... 或 perl 函数)来获取此格式

AA,BB,C,DD,E
aaaaaaaa,bbbbbb,ccccc,dddddd,eeeee
aaaaaaaa2,bbbbbb2,ccccc2,dddddd2,eeeee2

exm:
OUTPUT_STRING | awk ....

perlFunction(OUTPUT_STRING){
.....
返回格式化字符串;
}

我搜索了谷歌并在更多网站上尝试了许多帮助,但都不起作用,所以不要给我发链接

有些字段有 single : 而有些字段有 double :(这是随机的)

我尝试了一些帮助,但对我不起作用

sed -r 's/\\,|,|CN=|OU*//g' |awk -F "|=|:" '{printf $2"|"}' 要么 sed -n '1h; 2,$H;${g;s/\n/,/g;p}' | sed 's/,,/\n/g' 要么 awk -F ":" '{printf $2} {if (NF==0) {printf "\n"}}' | sed "s/ //" | sed "s/ /;/g"

【问题讨论】:

  • 欢迎堆栈溢出。你有什么办法解决这个问题?
  • @nibkuz,请将此尝试放在您的问题中,因为 cmets 不是为了提及它们,请编辑您的问题并将其添加到那里。

标签: linux perl awk


【解决方案1】:

达到预期结果的多种方法之一

use strict;
use warnings;

my $file = do { local $/; <DATA> };         # read whole file
my @blocks = split /\n\n/, $file;           # split file into blocks

my $print_header = 1;                       # flag to print header

foreach my $block (@blocks) {               # process each block
    $block =~ s/:+/:/g;                     # clean up the block :: -> :

    my @lines = split /\n/, $block;         # split the block into lines
    my(@header,@data);                      # arrays to store header and data

    foreach my $line (@lines) {             # process each line
        my($h,$d) = split /:\s*/, $line;    # split line into header and data part
        push @header, $h;                   # add header names into array
        push @data, $d;                     # add data into array
    }

    if( $print_header ){                    # if header not printed yet
        print join(',', @header) . "\n";    # print header array
        $print_header = 0;                  # flag the header is printed 
    }

    print join(',', @data)   . "\n";        # print data array
}

__DATA__
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE

AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE2

输出

AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE2

【讨论】:

    【解决方案2】:

    这个gnu awk 应该这样做:

    awk -v RS='' -F':* ?|\n' 'NR==1{print $1","$3","$5","$7","$9} {print $2","$4","$6","$8","$10}' t
    AA,BB,C,DD,E
    aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
    aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
    
    • RS='' 将记录选择器设置为空,因此 awk 在块模式下工作。
    • -F':* ?|\n' 将字段分隔符设置为 ::: 或换行符
    • NR==1{print $1","$3","$5","$7","$9} 第一行打印标题
    • {print $2","$4","$6","$8","$10} 打印数据字段。

    一个更通用的解决方案,应该适用于更多字段:

    awk -v RS='' -F':* ?|\n' 'NR==1{for(i=1;i<=NF-2;i+=2) printf "%s,",$i;print $i} {for(i=2;i<=NF-2;i+=2) printf "%s,",$i;print $i}' file
    AA,BB,C,DD,E
    aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
    aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
    

    PS 如果不是所有的记录都有所有的 ID,那么它就是另外一回事了。

    【讨论】:

      【解决方案3】:

      使用Text::CSV处理edge cases

      use strict;
      use warnings;
      use Text::CSV 'csv';
      
      my $input = do { local $/; readline }; # input from STDIN or filename argument
      
      my @aoh;
      my %headers;
      foreach my $block (split /\n\n+/, $input) {
        my %row;
        foreach my $line (split /^/, $block) {
          if ($line =~ m/^([^:]+):+\s*(.*)$/) {
            $row{$1} = $2;
            $headers{$1} = 1;
          }
        }
        push @aoh, \%row;
      }
      
      csv(in => \@aoh, out => *STDOUT, headers => [sort keys %headers],
        encoding => 'UTF-8', auto_diag => 2);
      

      【讨论】:

        猜你喜欢
        • 2012-06-07
        • 2019-06-14
        • 1970-01-01
        • 2023-03-14
        • 1970-01-01
        • 2018-04-25
        • 2013-09-22
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多