比较两个文件时grep缺少行答案

【问题标题】：grep missing line when comparing two files比较两个文件时grep缺少行
【发布时间】：2021-02-05 21:59:13
【问题描述】：

我有两个大文件

data.txt（包含大约 1324 行电子邮件子字符串）

test
test1
test3
test4
test6
test7
test9
test10

values.txt（包含大约 2221 行电子邮件：这包含了之前的 1324 行）

test@gmail.com
test1@gmail.com
test3@gmail.com
test4@gmail.com
test6@gmail.com
test7@gmail.com
test9@gmail.com
test10@gmail.com
test74@gmail.com
test14@gmail.com
test34@gmail.com
test44@gmail.com
test64@gmail.com
test74@gmail.com

一切正常，问题是我应该有一个包含 897 行电子邮件的文件，而我现在拥有的是 874 行电子邮件。

所以有 23 行丢失了，我不知道如何找到它们。也许是我的功能有问题？

grep -v -f data.txt values.txt > result.txt

有没有办法用 grep 做到这一点？

预期结果.txt

test74@gmail.com
test14@gmail.com
test34@gmail.com
test44@gmail.com
test64@gmail.com
test74@gmail.com

【问题讨论】：

``comm -1 only_in_data` 可能会对你有所帮助。使用 -2 并将输出更改为 only_in_values。祝你好运。
您是否有子字符串匹配项，其中data.txt 包含test1 并与test10@gmail.com 匹配？如果是这样，请尝试将 -w 添加到 grep 选项以要求完整的单词匹配。

标签： awk grep

【解决方案1】：

请您尝试以下操作。使用 GNU awk 中的示例编写和测试。考虑到您想在两个文件中获取通用 ID。

awk '
FNR == NR{
  arr[$0]
  next
}
!($1 in arr)
' data.txt  FS="@" values.txt

输出如下：

test74@gmail.com
test14@gmail.com
test34@gmail.com
test44@gmail.com
test64@gmail.com
test74@gmail.com

说明：为上述添加详细说明。

awk '                            ##Starting awk program from here.
FNR == NR{                       ##Checking condition which will be TRUE when data.txt is being read.
  arr[$0]                        ##Creating arr with index of current line.
  next                           ##next will skip all further statements from here.
}
!($1 in arr)                     ##Checking condition if 1st column is NOT present in arr then print line.
' data.txt  FS="@" values.txt    ##Mentioning Input_file names here.

【讨论】：

我用想要的结果更新了我的问题
@Taieb，哦，我已经用你展示的样本成功地测试了这个解决方案，请看我的答案我已经更新了我的答案中的示例输出，让我知道，谢谢。
现在它按预期工作，非常感谢您的解释，非常感谢。祝你有美好的一天。