【问题标题】:awk on a text file to csv将文本文件 awk 转换为 csv
【发布时间】:2018-04-09 06:16:42
【问题描述】:

我有一个大型文本文件,需要将其转换为 CSV 文件,以便将其导入 MySQL 数据库。

文本文件如下所示:

原始文本文件

VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;

想要的结果

我需要将其转换为如下所示的 CSV 文件:

type;   Productnumber;  Productname;    Description;        measurement_unit;   price_unit; price_unit_txt; price;  crowd;  price_date; status; block_number;   discount_group; manufac;    type;   stocked;    sales_package;  discount;   price_type; PDF;        IMAGE;      baseinfo;               WEIGHT; VOLUME; dimensjon;  UNSPSC;     va_2;           va_3;
1;      1001;           Productname 1;  Description 1;      2;                  MTR;        METER;          217883; 10000;  20180402    1;      010206;         &10;            PRODUCER1;  ;       N;          10000;          ;           ;           ;           ;           ;                       ;       ;       ;           ;           4044773815245;  0036453;
1;      1002;           Productname 2;  Description 2;      2;                  MTR;        METER;          140365; 10000;  20180402;   1;      010206;         &10;            PRODUCER2   ;       N;          10000;          ;           ;           ;           ;           ;                       7500;   3249;   57x57x1000; ;           4044773452884;  0036479;
1;      1003;           Productname 3;  Description ABC 3;  2;                  MTR;        METER;          1575;   10000;  20171006;   1;      010606;         &10;            PRODUCER3;  ;       N;          10000;          ;           ;           1003.pdf;   1003.png;   http://127.0.0.1/1003/; 20;     ;       0x7x0;      26121616;   7070613017149;  1000116;        

原文件说明

第一条产品线总是以VL开头,然后依次继续:

type;Productnumber;Productname;Description;measurement_unit;price_unit;price_unit_txt;price;crowd;price_date;status;block_number;discount_group;manufac;type;stocked;sales_package;discount;price_type;

PDF         is always on a new line starting with VX;PDF;
IMAGE       is always on a new line starting with VX;IMAGE;
baseinfo    is always on a new line starting with VX;BASEINFO;
WEIGHT      is always on a new line starting with VX;WEIGHT;
VOLUME      is always on a new line starting with VX;VOLUME;
dimensjon   is always on a new line starting with VX;DIMENSJON;
UNSPSC      is always on a new line starting with VX;UNSPSC;
va_2        is always on a new line starting with VA;2;
va_3        is always on a new line starting with VA;3;

希望有人能帮我解决这个问题:)

【问题讨论】:

  • 不清楚,请在您的帖子中提供更多详细信息。
  • 根据您的样本,您能否显示预期结果?
  • 您好,exepted 结果将在“我需要将其转换为如下所示的 CSV 文件:”中提及:
  • 请向我们展示您迄今为止所做的尝试并解释您的问题。 StackOverflow 不是“免费写我的代码”服务。
  • 你需要在 awk 中创建一个“状态机”,它会消耗行,并且只有在遇到下一行 VL 时才会打印。这是将不规则数据转换为矩形格式的唯一方法。

标签: bash csv awk sed


【解决方案1】:

一种可能的方法(不是唯一的解决方案)

#!/bin/bash

awk -F';' '
    function init() {
            # formation line to print_line
        line = vl pdf image baseinfo  weight volume dimensjon unspsc va_2 va_3 
            # erase ^M (\r)
        gsub( /\r/;"";line )
            # print a block
        print line
            # initialisation variables
        vl = pdf = image = baseinfo  = weight = volume = dimensjon = unspsc = va_2 = va_3 = ";"
    }
        # head/title, note that "%12s" format with 12 characters width
    BEGIN { printf ( "%12s; %s; %s; %s; %s; %s; %s; %s; %s; %s;","vl","pdf","image","baseinfo ","weight","volume","dimensjon","unspsc","va_2","va_3" ) }
    /^VL/ { init(); ; vl = sprintf( "%12s; %s; %s; %s; ", $3, $4, $5, $6 ) }
    /^VX;WEIGHT;/ { weight = sprintf( "%s; ", $3 )}
    # .. another conditions
    END { init() }
' file.dat  # > outputfile.csv

用于测试:

cat << end > file.dat
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
end

输出

      vl; pdf; image; baseinfo ; weight; volume; dimensjon; unspsc; va_2; va_3;
    1001; Productname 1; Description 1; 2; ;;;;;;;;;
    1002; Productname 2; This is product decrtiption for 2 product; 2; ;;;7500; ;;;;;
    1003; Productname 3; Description......; 2; ;;;20; ;;;;;

【讨论】:

  • 谢谢kyodev,这对我很有帮助:)
  • 欢迎您,也许您可​​以关闭这个问题?
  • 是的,我会关闭它 :) 再问一个问题.. 我在 PDF 文件后得到一个“^M”,你知道这是什么原因吗?示例:/eib_bus.pdf^M;
  • 谢谢,只需将 gsub 修改为:gsub( /\r/,"",line )
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-07-20
  • 1970-01-01
相关资源
最近更新 更多