使用 gawk / awk / sed 从多个文件中删除行答案

【问题标题】：delete lines from multiple files using gawk / awk / sed使用 gawk / awk / sed 从多个文件中删除行
【发布时间】：2012-05-13 10:41:47
【问题描述】：

我有两组文本文件。第一套在 AA 文件夹中。第二组在 BB 文件夹中。第一组（AA文件夹）的ff.txt文件的内容如下所示。

Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63

如果标记>60，我想从这个文件中打印第二列（数字）。输出将是 3,4。接下来，读取 BB 文件夹中的 ff.txt 文件并删除包含数字 3,4 的行。

BB 文件夹中的文件如下所示。第二列是数字。

 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

我使用了以下代码。此代码适用于一个文件。

gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done

但是当我用多个文件运行这段代码时，我得到了错误。

gawk 'BEGIN {getline} $3>60{print $2}' AA/*.txt | while read number; do gawk -v number=$number '$2 != number' BB/*.txt > /tmp/*.txt; mv /tmp/*.txt BB/*.txt; done

error:-
mv: target `BB/kk.txt' is not a directory

我两天前问过这个问题。请帮我解决这个错误。

【问题讨论】：

标签： sed awk gawk

【解决方案1】：

这将创建文件夹 AA 中所有文件的索引，并检查文件夹 BB 中的所有文件：

cat AA/*.txt | awk 'FNR==NR { if ($3 > 60) array[$2]; next } !($2 in array)' - BB/*.txt

这会比较两个单独的文件，假设它们在文件夹 AA 和 BB 中具有相同的名称：

ls AA/*.txt | sed "s%AA/$.*$%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 %" | sh

HTH

编辑

这应该会有所帮助:-)

ls AA/*.txt | sed "s%AA/$.*$%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 > \1_tmp \&\& mv \1_tmp BB/\1 %" | sh

【讨论】：

感谢您的回答。我想从 BB 文件夹中的文件中删除这些行。文件在 AA 和 BB 中具有相同的名称。如何更改您的代码？
很遗憾awk 不支持内联编辑。我的编辑创建了一个临时文件，然后替换它。

【解决方案2】：

> /tmp/*.txt 和 mv /tmp/*.txt BB/*.txt 是错误的。

对于单个文件

awk 'NR>1 && $3>60{print $2}' AA/ff.txt > idx.txt

awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt BB/ff.txt

对于多个文件

awk 'FNR>1 && $3>60{print $2}' AA/*.txt >idx.txt

cat BB/*.txt | awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt -

【讨论】：

感谢您的回答。您的代码只打印数字而不是删除行。
idx.txt 是一个临时文件。您应该一个接一个地运行两个命令。第二个命令将打印你想要的。

【解决方案3】：

一个perl解决方案：

use warnings;
use strict;
use File::Spec;

## Hash to save data to delete from files of BB folder.
## key -> file name.
## value -> string with numbers of second column. They will be
## joined separated with '-...-', like: -2--3--1-. And it will be easier to
## search for them using a regexp.
my %delete;

## Check arguments:
## 1.- They are two.
## 2.- Both are directories.
## 3.- Both have same number of regular files and with identical names.
die qq[Usage: perl $0 <dir_AA> <dir_BB>\n] if
        @ARGV != 2 ||
        grep { ! -d } @ARGV;

{
        my %h;
        for ( glob join q[ ], map { qq[$_/*] } @ARGV ) {
                next unless -f;
                my $file = ( File::Spec->splitpath( $_ ) )[2] or next;
                $h{ $file }++;
        }

        for ( values %h ) {
                if ( $_ != 2 ) {
                        die qq[Different files in both directories\n];
                }
        }
}

## Get files from dir 'AA'. Process them, print to output lines which 
## matches condition and save the information in the %delete hash.
for my $file ( glob( shift . qq[/*] ) ) {
        open my $fh, q[<], $file or do { warn qq[Couldn't open file $file\n]; next };
        $file = ( File::Spec->splitpath( $file ) )[2] or do { 
                warn qq[Couldn't get file name from path\n]; next };
        while ( <$fh> ) {
                next if $. == 1;
                chomp;
                my @f = split;
                next unless @f >= 3;
                if ( $f[ $#f ] > 60 ) {
                        $delete{ $file } .= qq/-$f[1]-/;
                        printf qq[%s\n], $_;
                }
        }
}

## Process files found in dir 'BB'. For each line, print it if not found in
## file from dir 'AA'.
{
        @ARGV  = glob( shift . qq[/*] );
        $^I = q[.bak];
        while ( <> ) {

                ## Sanity check. Shouldn't occur.
                my $filename = ( File::Spec->splitpath( $ARGV ) )[2];
                if ( ! exists $delete{ $filename } ) {
                        close ARGV;
                        next;
                }

                chomp;
                my @f = split;
                if ( $delete{ $filename } =~ m/-$f[1]-/ ) {
                        next;
                }

                printf qq[%s\n], $_;
        }
}

exit 0;

一个测试：

假设下一个文件树。命令：

ls -R1

输出：

.:
AA
BB
script.pl

./AA:
ff.txt
gg.txt

./BB:
ff.txt
gg.txt

以及文件的下一个内容。命令：

head AA/*

输出：

==> AA/ff.txt <==
Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63
==> AA/gg.txt <==
Name        number     marks
john            1         70
maria           2         54
samuel          3         42
ben             4         33

命令：

head BB/*

输出：

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18
==> BB/gg.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

像这样运行脚本：

perl script.pl AA/ BB

以下输出到屏幕：

samuel          3         62
ben             4         63
john            1         70

BB 目录下的文件修改如下：

head BB/*

输出：

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82

==> BB/gg.txt <==
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

因此，从ff.txt 中删除了编号为3 和4 的行，并删除了gg.txt 中编号为1 的行，它们都大于最后一列中的60。我认为这就是您想要实现的目标。我希望它有所帮助，虽然不是awk。

【讨论】：