【问题标题】:Pattern matching between two files两个文件之间的模式匹配
【发布时间】:2015-08-10 21:55:33
【问题描述】:

我有两个文件:file1 和 file2

文件1:

1,0,0
2,1,2

文件2:

abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (sds; dks; id:2;)

输出:

#abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (sds; dks; id:2;)

如果file2中id后面的数字与file1的第一列匹配,

then: if third column in file1 is 0,print $1 of file2=abc else $1 of file=zxc
      if second column in file1 is 0,insert # at beginning 

另一个案例 文件1:

1,0,0
3,1,2

文件2:

abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (ders; dks; id:2;)
sdsd sdsdsdsddddsdjldk  vbvewqr dsm wwl;awww (cvv; fgs; id:3;)

Sometimes,the files will contain different number of lines.
In that case,if column one in file1 does not match with id in file2,it has to continue checking with next line in file2

如何在不使用 shellscript 合并两个文件的情况下进行匹配和修改?

【问题讨论】:

  • 如果包含在第一个中,是最后一个还是单独的?
  • 包含在第一个中
  • 你能发布你想要的输出吗?
  • 好吧,我以为你的意思是文件 1 的 $1 作为第一列,但 print $1 in file2=a else $1 in file=b 我现在完全迷路了。
  • file1 是 $1,对不起

标签: shell awk sed


【解决方案1】:

GNU awk 4

使用这个 awk 脚本:

FNR==NR{
    arr[FNR][1] = $1
    arr[FNR][2] = $2
    arr[FNR][3] = $3
}
FNR!=NR{
    val = gensub(/.*id:([0-9]+)[^0-9]*.*/, "\\1", "g", $0)
    if (arr[FNR][1] == val) {
        if (arr[FNR][2] == 0)
            printf "#"
        if (arr[FNR][3] == 0)
            $1 = "a"
        else
            $2 = "b"
    }
    print $0
}

使用:awk -F '[, ]' -f script.awk file1 file2调用它

GNU awk 3

试图使脚本适用于awk 的早期版本:

# This awk script will perform these checks for EVERY single line:

# when FNR == NR we are in the first file
# FNR is the line number of the current file
# NR is the total number of lines passed
FNR==NR{
    # save the line of file1 to array with index it's line number
    arr[FNR] = $0
}
# we are now in file 2, because FNR could be 1 but NR is now 1 + lines
# in file 1
FNR!=NR{
    # create an array by splitting the corresponding line of file 1
    # we split using a comma: 0,1,2 => [0, 1, 2]
    split(arr[FNR], vals, ",")
    # use regex to extract the id number, we drop everything from the
    # line besides the number after "id:"
    val = gensub(/.*id:([0-9]+)[^0-9]*.*/, "\\1", "g", $0)
    # if first value of line in file1 is same as ID
    if (vals[1] == val) {
        # if second value of line in file1 is 0
        if (vals[2] == 0)
            # print # at beginning of line without adding a newline
            printf "#"
         # if third value of line in file1 is 0
        if (vals[3] == 0)
            # save "a" to var, else
            var = "a"
        else
            # save "b" to var
            var = "b"
    }
    # now sub the first word of the line [^ \t]* by var
    # and keep everything that follows (...) = \\1
    # the current line is $0
    # and print this modified line (now it's printed with a newline)
    print gensub(/^[^ \t]*([ \t].*)/, var "\\1", "g", $0)
}

简单地运行为:

awk -f script.awk file1 file2

【讨论】:

  • 对不起,你能看看上面粘贴的输出吗?所有修改都在file2中完成
  • 你能验证输出吗?
  • arr[FNR][1] = $1 ^ 语法错误。为什么会这样?
  • 使用-F [, ]再试一次,虽然这个脚本不应该出错,因为它不在这里......
  • 请运行awk --version | head -n1,然后粘贴输出。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-01-18
  • 1970-01-01
  • 2018-07-12
  • 2016-12-09
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多