【问题标题】:I want to read a file and store some variables with AWK我想读取文件并使用 AWK 存储一些变量
【发布时间】:2020-12-30 22:42:00
【问题描述】:

我有一个包含以下内容的文件。它是在设备中查询的结果,因此预计在数据库中找不到某些输入。以下示例是成功和不成功查询的结果。我的意思是第二个示例没有我想要捕获到变量中的所有信息,所以我想忽略这个结果并将变量设置为 null/空值。

<INTLPO:ISV=PORTAB NTL="6130290095" VEM=NAO;
VECTURA - SS            BSA002             2020-09-12            09-32
INTLPO:ISV=PORTAB NTL="6130290095" VEM=NAO;
INTERROGACAO DE NUMERO TELEFONICO PARA PORTABILIDADE NUMERICA                   

  TIPO DE ENCAMINHAMENTO POR ASSINANTE
  NTL = 6130290095           OPC = S_INF    RNP = 551      CSP = 25
  EIP = S_INF
  CDO = 00961
  CNL = 61000                NUF = S_INF                   TPB = PREST
  CPT = NAO                  CRE = 125      NUE = S_INF
  DAT = 2014-04-16           HOR = 10:30:20.798609
  TBR = 25
  RST              MAN      RST              MAN      RST              MAN
  2%               934      3%               934      4%               934
  5%               934      6%               934      7%               934
  8%               934      9%               934      9090%            934
  0??%             934      90??%            934      0?0%             934


  TOTAL DE NUMEROS ASSOCIADOS AO SERVICO: 1
<INTLPO:ISV=PORTAB NTL="6160150178" VEM=NAO;
VECTURA - SS            BSA002             2020-09-12            09-32
INTLPO:ISV=PORTAB NTL="6160150178" VEM=NAO;
INTERROGACAO DE NUMERO TELEFONICO PARA PORTABILIDADE NUMERICA                   

  ME:  NENHUM NUMERO CADASTRADO ATENDE AS ESPECIFICACOES

我有以下部分可以的代码。结果有点混乱(行重复,甚至值错误)。

awk -F ' ' 'BEGIN { OFS="," }
            /^VECTURA/ { equipment = $4; data = $5 }
            /^INTLPO/ { numero = $2}
            /^\s*NTL/ { ntl = $3 ; opc = $6; rnp = $9; csp = $12}
            /^\s*EIP/ { eip = $3}
            /^\s*CDO/ { cdo = $3}
            /^\s*CNL/ { cnl = $3; nuf = $6; tpb = $9}
            /^\s*CPT/ { cpt = $3; cre = $6; nue = $9}
            /^\s*DAT/ { dat = $3; hor = $6}
            /^\s*TBR/ { tbr = $3}
            /^\s*RST/ { man = $2; next}
            { print data, equipment, numero, ntl, opc, rnp, csp, eip, cdo, cnl, nuf, tpb, cpt, cre, nue, dat, hor, tbr, man}' input.tx >> output.txt

结果

2020-09-12,BSA002,6160150536,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6160150536,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,,,,,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,,,,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,,,,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,,,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN
2020-09-12,BSA002,6160150178,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,MAN

请注意,记录 6130290095(变量 NTL)与“数字”记录(上例的最后几行)错误地关联。

我该如何克服呢?我尝试了一些 AWK 条件语句,但也没有成功。 作为输出,我只想逐行记录,因为输出示例的某些行可以举例说明。 非常感谢。

【问题讨论】:

  • 如果您只想打印一次结果,请使用END { print ...}
  • 请包含您的示例输入文件的预期输出。它是一个文件吗?它的外观如何? (或者一个 bash 数组?)
  • 我不想只打印一次。我只想每条记录打印一次。例如:我有一个包含 50 个查询的列表,其中一些包含我需要的所有信息,而另一些则没有。输出是 50 条“数字”记录的列表,对于那些拥有我想打印的所有信息的人,我想用逗号分隔它们(就像前面提供的示例中的最后一个),而那些没有的人,被打印为"6130300000,BSA2,,,,,,,,,,,,,,,,,,,,,,,......
  • 输入是脚本中显示的文件(input.txt >> output.txt)。再次感谢。
  • 从 RST MAN 行中提取什么?

标签: regex shell awk readfile text-processing


【解决方案1】:

如果您只想更改numero 的值而不设置它,请添加类似numero || 的测试。
阅读您的评论后,我改变了我的解决方案。据我所知,您不希望一条记录将所有块的结果组合在一起,但您希望每个处理的块都有一条结果行。每个新块都以&lt;INTLPO开头。
此解决方案将使新块开始时的所有值都为空(第一个块不需要,但不会造成伤害)。
当找到一个新块以及我们在文件末尾时,会显示一个块的结果。

awk 'function newrecord() {
        recordnumber++;
        data=equipment=numero=ntl=opc=rnp=csp=eip=cdo="";
        cnl=nuf=tpb=cpt=cre=nue=dat=hor=tbr=man="";
     }
     function printrecord() {
         print data, equipment, numero, ntl, opc, rnp, csp, eip,
               cdo, cnl, nuf, tpb, cpt, cre, nue, dat, hor, tbr, man;
     }

     BEGIN { OFS="," }
            /^<INTLPO/ { if (recordnumber) printrecord(); newrecord(); }
            /^VECTURA/ { equipment = $4; data = $5 }
            /^INTLPO/ { numero = $2}
            /^\s*NTL/ { ntl = $3 ; opc = $6; rnp = $9; csp = $12}
            /^\s*EIP/ { eip = $3}
            /^\s*CDO/ { cdo = $3}
            /^\s*CNL/ { cnl = $3; nuf = $6; tpb = $9}
            /^\s*CPT/ { cpt = $3; cre = $6; nue = $9}
            /^\s*DAT/ { dat = $3; hor = $6}
            /^\s*TBR/ { tbr = $3}
            /^\s*RST/ { man = $2; next}
            END { printrecord(); }
      ' input.tx

【讨论】:

  • 您好。谢谢你的回答。我已经尝试过了,但没有奏效。我有一个包含数百个结果的文件,例如问题中的示例。其中一部分没有结果信息(因为这些记录在设备的数据库中不存在)。但我想用所有记录写一个输出。每个输入记录一行包含所有可用信息。当信息块没有所有信息时,我想用他们拥有的最少信息打印它们(例如 2020-09-12,BSA2,6130300020,,,,,,,,,,,,).. .
  • 非常感谢,明白了。您的解决方案优雅而简单,效果很好,无需多言。
【解决方案2】:

解决您有tag = value 对的任何问题的最佳方法是首先填充该映射的数组(下面的tag2val[]),然后您可以通过标签(名称)以任何顺序访问您喜欢的任何值你喜欢。

$ cat tst.awk
BEGIN {
    OFS = ","
    numTags = split("\
                EQUIPMENT \
                DATE \
                NUMERO \
                NTL \
                OPC \
                RNP \
                CSP \
                EIP \
                CDO \
                CNL \
                NUF \
                TPB \
                CPT \
                CRE \
                NUE \
                DAT \
                HOR \
                TBR \
                MAN \
                ",tags)
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tag = tags[tagNr]
        printf "%s%s", tag, (tagNr<numTags ? OFS : ORS)
    }
}
/^</ && (NR > 1) {
    prt()
    delete tag2val
}
$1 == "VECTURA" {
    tag2val["EQUIPMENT"] = $4
    tag2val["DATE"] = $5
}
/^INTLPO/ {
    gsub(/^[^"]+"|"$/,"",$2)
    tag2val["NUMERO"] = $2
}
/^([[:space:]]*[^[:space:]]+ = [^[:space:]]+)+$/ {
    for (i=1; i<NF; i+=3) {
        tag2val[$i] = $(i+2)
    }
}
nextLineTag != "" {
    tag2val[nextLineTag] = $2
    nextLineTag = ""
}
/^[[:space:]]*RST[[:space:]]+MAN/ {
    nextLineTag = "MAN"
}
END { prt() }

function prt(   tagNr, tag, val) {
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tag = tags[tagNr]
        val = tag2val[tag]
        printf "%s%s", val, (tagNr<numTags ? OFS : ORS)
    }
}

.

$ awk -f tst.awk file
EQUIPMENT,DATE,NUMERO,NTL,OPC,RNP,CSP,EIP,CDO,CNL,NUF,TPB,CPT,CRE,NUE,DAT,HOR,TBR,MAN
BSA002,2020-09-12,6130290095,6130290095,S_INF,551,25,S_INF,00961,61000,S_INF,PREST,NAO,125,S_INF,2014-04-16,10:30:20.798609,25,934
BSA002,2020-09-12,6160150178,,,,,,,,,,,,,,,,

【讨论】:

    猜你喜欢
    • 2020-12-28
    • 2018-12-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-06-03
    相关资源
    最近更新 更多