【问题标题】:Bash regex overwrite line if multiple match如果多个匹配,则 Bash 正则表达式覆盖行
【发布时间】:2022-10-17 11:26:59
【问题描述】:

我有一个 bash 脚本,其中有 3 个正则表达式。我想通过条件 if 找到文件中第一个模式的匹配项。

如果有匹配项,则在第二个模式中查找匹配项,但仅查找与第一个模式匹配的行。

最后,只检查与第二个模式匹配的行(也是已经与第一个模式匹配的行)来检查第三个模式。

我有以下代码,但我不知道如何判断是否存在覆盖“行”值的匹配项,以将总行数减少到仅匹配的行数。

 #!/bin/bash
    pattern1= egrep '^([^,]*,){31}[1-9][0-9].*'
    pattern2= egrep '^([^,]*,){16}[0-1].[3-9].*'
    pattern3= egrep '^([^,]*,){32}[2-9][0-9].*'

while read line
    do
        if [[$line == $pattern1]];then
        newline == $pattern1
        if [[$newline == $pattern2 ]];then
        newline2 == $pattern2
        if [[$newline2 == $pattern3 ]]; then
        echo $pattern3

        fi
    done < mj1.csv  #this is the input file
    

我将此脚本称为./b1.sh &lt;filename&gt;

一些输入数据:

EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,10,10,11/15/1984,21,272,21.74469541,CHI,1,BOS,0,-20,1,33,12,24,0.5,0,1,0,3,3,1,0,2,2,2,2,1,1,4,27,17.1
1985,11,11,11/17/1984,21,274,21.75017112,CHI,1,PHI,0,-9,1,44,4,17,0.235,0,0,,8,8,1,0,5,5,7,5,2,4,5,16,12.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,14,14,11/23/1984,21,280,21.76659822,CHI,0,SEA,1,19,1,30,9,13,0.692,0,0,,5,6,0.833,0,4,4,3,4,1,4,4,23,19.5
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,16,16,11/27/1984,21,284,21.77754962,CHI,0,GSW,0,-6,1,24,6,10,0.6,0,0,,1,1,1,0,2,2,3,3,2,4,1,13,11.1
1985,17,17,11/29/1984,21,286,21.78302533,CHI,0,PHO,0,-5,1,30,9,17,0.529,1,1,1,3,4,0.75,1,2,3,2,2,0,2,5,22,14
1985,18,18,11/30/1984,21,287,21.78576318,CHI,0,LAC,1,4,1,37,9,15,0.6,0,0,,2,4,0.5,2,3,5,5,3,0,4,4,20,15.5
1985,19,19,12/2/1984,21,289,21.79123888,CHI,0,LAL,1,1,1,42,7,13,0.538,0,0,,6,8,0.75,2,0,2,3,1,1,4,3,20,12.9
1985,20,20,12/4/1984,21,291,21.79671458,CHI,1,NJN,1,15,1,35,7,13,0.538,0,0,,6,6,1,1,2,3,6,1,0,3,3,20,16
1985,21,21,12/7/1984,21,294,21.80492813,CHI,1,NYK,1,2,1,43,8,16,0.5,0,1,0,5,7,0.714,1,1,2,3,2,0,6,5,21,9.3
1985,22,22,12/8/1984,21,295,21.80766598,CHI,1,DAL,1,2,1,35,10,23,0.435,0,0,,0,0,,4,3,7,2,0,2,2,3,20,11.2
1985,23,23,12/11/1984,21,298,21.81587953,CHI,1,DET,0,-7,1,37,13,28,0.464,0,1,0,1,3,0.333,1,7,8,6,2,0,3,4,27,16.2
1985,24,24,12/12/1984,21,299,21.81861739,CHI,0,DET,0,-7,1,30,6,17,0.353,0,2,0,9,10,0.9,0,1,1,2,2,1,1,5,21,12.5
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,26,26,12/15/1984,21,302,21.82683094,CHI,1,PHI,0,-12,1,27,7,16,0.438,0,0,,0,0,,1,1,2,2,1,0,1,2,14,7.2
1985,27,27,12/18/1984,21,305,21.83504449,CHI,1,HOU,0,-8,1,45,8,20,0.4,0,1,0,2,4,0.5,1,2,3,8,3,0,1,2,18,14.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6

为方便起见,模式 1 匹配列 PTS 高于 10 的所有行,模式 2 匹配列 FG_PCT 高于 0.3 的行,模式 3 匹配列 GmSc 高于 19 的所有行。

【问题讨论】:

  • 您可以添加一些示例数据吗?
  • 你想用线条做什么?它看起来很像你真正想要的是awk '$32 ~ /[1-9][0-9]/ &amp;&amp; $17 ~ /[0-1].[3-0]/ &amp;&amp; $33 ~ /[2-9][0-9]/ 或类似的东西。
  • 请将您的脚本粘贴到shellcheck.net 并尝试实施那里提出的建议。
  • 嗨@WilliamPursell,我想将其保留为 3 个不同的正则表达式,而不是合并它们。合并它们可能会更好,没有讨论,但我想了解是否可能。
  • @HatLess 当然,检查更新的问题:)

标签: regex bash grep


【解决方案1】:

虽然awk 解决方案会更快一些……但我们将根据 OP 的要求关注bash 解决方案。

第一个问题是正则表达式匹配使用=~ 运算符而不是== 运算符。

第二个问题是,如果只有所有 3 个正则表达式都匹配,则保持一行意味着我们想要(&amp;&amp;) 所有 3 个正则表达式匹配的结果。

第三个问题解决了 OP 当前代码的一些基本语法问题(例如,[[ 之后和]] 之前的空格;将正则表达式模式不正确地分配给pattern* 变量)。

一个bash的想法:

pattern1='^([^,]*,){31}[1-9][0-9].*'
pattern2='^([^,]*,){16}[0-1].[3-9].*'
pattern3='^([^,]*,){32}[2-9][0-9].*'

head -1 mj1.csv > mj1.new.csv

while read -r line
do
    if [[ "${line}" =~ $pattern1 && "${line}" =~ $pattern2 && "${line}" =~ $pattern3 ]]
    then
        # do whatever with $line, eg:
        echo "${line}"
    fi
done < mj1.csv >> mj1.new.csv

这会产生:

$ cat mj1.new.csv
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6

笔记:OP 还没有(还)提供预期的输出,所以此时我必须假设 OP 的正则表达式是正确的

【讨论】:

  • 嗨,据我了解,您正在从原始文件中获取标题,但将其分配给新文件。和输出一样吗?您正在读取原始文件并使用输出生成一个新文件。
  • 正确,拉标题(第一行)并放入新文件,然后让while 循环将任何匹配的行附加到这个新文件的末尾;在这一点上,我只是猜测最终结果应该是什么,因为我们(还)不知道这个练习的预期结果(例如,如果你不想要标题数据,那么你可以删除 @987654333 @ 线)
  • 我们可以使用echo "${line}" &gt;&gt; mj1.new.csv,但这需要脚本打开/关闭文件描述符一次,以便添加每一行;通过将&gt;&gt; mj1.new.csv 推到while 循环的末尾,我们只打开/关闭文件描述符一次;任何一种方法都“有效”,但建议的答案会更有效率
  • 是的,很抱歉没有发布预期的输出。它是一个超过 1000 行的文件。虽然模式是正确的,但如果你的答案的语法是正确的,那应该没问题。
  • 我们不需要看到 1000 行的预期输出,我们想要看到的是您提供的样本数据集的预期输出......在我的例子中,29 行输入产生了 10 行输出;现在的问题变成了……这 10 行输出是否与您的预期输出匹配(对于 29 行输入)
猜你喜欢
  • 1970-01-01
  • 2023-03-28
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-12-11
  • 1970-01-01
  • 2017-07-26
  • 1970-01-01
相关资源
最近更新 更多