如果 id 匹配，比较两个文件并水平打印所有值？答案

【问题标题】：Compare two files and print all values horizontally if ids are matching?如果 id 匹配，比较两个文件并水平打印所有值？
【发布时间】：2020-02-12 07:19:07
【问题描述】：

您好，我有两个文件如下：文件1.txt

10   A   B  C
10   A   D  K
11   X   Y  Z
10   A   K  B 
11   Y   X  A

文件2.txt

10
11
12

预期的输出是

10 A  B  C  A  D  K  A  K  B
11 X  Y  Z  Y  X  A

我试过命令grep -f File1.txt file2.txt

但它并没有给我所有来自相同 ID 的价值

【问题讨论】：

您使用的是 Python 还是 Bash？或者您是否打算找到使用 Python 或 Bash 的解决方案？
File2.txt的相关性是什么？您的预期输出只能通过 File1.txt 计算。
@Edric 打算找到解决方案

标签： python excel bash compare

【解决方案1】：

请您尝试关注一下。

awk '
FNR==NR{
  val=$1
  $1=""
  sub(/^ +/,"")
  a[val]=(a[val]?a[val] OFS:"")$0
  next
}
($1 in a){
  print $1,a[$1]
}
' File1.txt File2.txt

输出如下：

10 A B C A D K A K B
11 X Y Z Y X A

说明：为上述代码添加详细说明。

awk '                                ##Starting awk program from here.
FNR==NR{                             ##Checking condition if FNR==NR which will be true for first file1.
  val=$1                             ##Creating variable val with $1 value of it.
  $1=""                              ##Nullify $1 here.
  sub(/^ +/,"")                      ##Substituting initial space with NULL for current line.
  a[val]=(a[val]?a[val] OFS:"")$0    ##Creating array a with index val and keep concatenating its value to it.
  next                               ##next will skip further statements from here.
}
($1 in a){                           ##Checking condition if $1 of current line comes in array a then do following.
  print $1,a[$1]                     ##Printing $1 of current line and value of array a with $1 value.
}
' File1.txt File2.txt                ##Mentioning Input_file names here.

【讨论】：

【解决方案2】：

如果密钥在第二个文件中，您可以将该行添加到字典中。
示例：

from collections import defaultdict

d = defaultdict(list)

with open("f1.txt") as f1, open("f2.txt") as f2:
    keys = set(f2.read().splitlines())
    for line in f1:
        k, *rest = line.split()
        if k in keys:
            d[k]+=rest


>>> print(*d.items(),sep="\n")
('10', ['A', 'B', 'C', 'A', 'D', 'K', 'A', 'K', 'B'])
('11', ['X', 'Y', 'Z', 'Y', 'X', 'A'])

【讨论】：

这工作...！！！但如果它们之间存在，我也需要添加空列。例如'A'、''、'B'

【解决方案3】：

我在这里看不到 File2.txt 的相关性。因此，如果输出的间距不重要，您可以使用这 1 行命令：

sort File1.txt | awk '$1!=key {if (sum) print key sum; key=$1; sum=""} {$1=""; sum=sum $0} END {print key sum}'

【讨论】：

【解决方案4】：

你可以试试这个：

with open("File1.txt") as f1, open("File2.txt") as f2:
    dictf1 = {}
    for i in f1.readlines():
        i = i.split()
        if i[0] in dictf1.keys():
            dictf1[i[0]] += i[1:]
        else:
            dictf1[i[0]] = i[1:]
    for i in f2.readlines():
        if i[:-1] in dictf1.keys():
            print(i[:-1], " ".join(dictf1[i.strip()]))

【讨论】：

这行得通……！！！但如果它们之间存在，我也需要添加空列。例如'A'、''、'B'
@Snijesh 你希望输出为字符串还是列表？
如果输出类似于数据框或制表符分隔的文本会更好
@Snijesh 当前输出为10 A B C A D K A K B 和11 X Y Z Y X A