Perl 从两个数组中找到相似的元素答案

【问题标题】：Perl find similar elements from two arraysPerl 从两个数组中找到相似的元素
【发布时间】：2016-12-21 18:48:31
【问题描述】：

我想从 @amplicon_exon 数组中检索包含与 @failedamplicons 数组相似的元素 (like) 的元素。 @failedamplicons 中的每个元素都是唯一的，并且只能匹配 @amplicon_exon 中的单个元素。我尝试了两个 for 循环，但得到了重复值。有没有更好的方法从两个数组中查找和检索相似的值？

@failedamplicons: example:
OCP1_FGFR3_8.87
OCP1_AR_14.89

@amplicon_exon: example:
TEST_Focus_ERBB2_2:22:ERBB2:GENE_ID=ERBB2;PURPOSE=CNV,Hotspot;CNV_ID=ERBB2;CNV_HS=1
OCP1_FGFR3_8:intron:FGFR3:GENE_ID=FGFR3;PURPOSE=CNV;CNV_ID=FGFR3;CNV_HS=1
OCP1_CDK6_14:intron:CDK6:GENE_ID=CDK6;PURPOSE=CNV;CNV_ID=CDK6;CNV_HS=1

这里有两个for循环代码：

my $i = 0;
my $j = 0;

for ( $i = 0; $i < @amplicon_exon; $i++ ) {

    for ( $j = 0; $j < @failedamplicons; $j++ ) {

        my $fail_amp = ( split /\./, $failedamplicons[$j] )[0];

        #print "the failed amp before match is $fail_amp\n";

        if ( index( $amplicon_exon[$i], $fail_amp ) != -1 ) {

            #print "the amplicon exon that matches $amplicon_exon[$i] and sample is $sample_id\n";
            print "the failed amp that matches $fail_amp and sample is $sample_id\n";

            my @parts = split /:/, $amplicon_exon[$i];
            my $exon_amp = $parts[1];

            next unless $parts[3] =~ /Hotspot/;    #includes only Hotspot amplicons
            my $gene_res   = $parts[2];
            my $depth      = ( split /\./, $failedamplicons[$j] )[1];
            my @total_amps = (
                $run_name, $sample_id, $gene_res, $depth, $fail_amp, $run_date, $matrix_status
            );

            my $lines = join "\t", @total_amps;

            push( @finallines, $lines );
        }
    }
}

【问题讨论】：

你能否提供一个精确的标准来判断是否“相似”？
amplicon_exon 元素必须在“.”之前包含 failedamplicons 元素的完整字符串。例如：OCP1_FGFR3_8:intron:FGFR3:GENE_ID=FGFR3;PURPOSE=CNV;CNV_ID=FGFR3;CNV_HS=1 包含 OCP1_FGFR3_8 谢谢
@user3781528：我已经整理了你的 Perl 代码，以便我可以阅读它。请在以后发布清晰的代码。
@user3781528 仅供参考：整理代码的快速方法是安装 Perl::Tidy。它带有一个程序，perltidy，您可以使用它来整理您的 Perl 脚本。 metacpan.org/pod/distribution/Perl-Tidy/lib/Perl/Tidy.pod

标签： arrays perl

【解决方案1】：

split 和 grep 是你的朋友，迭代列表的惯用方法也是如此。只需遍历第一个数组，仅提取要匹配的部分（通过使用split 将元素拆分为. 字符，然后只取第一个条目），然后使用正则表达式grep第二个数组中从元素开头到:的那部分字符串：

for my $elem (@failedamplicons){
    my $to_match = (split /\./, $elem)[0];
    if (my ($matched) = grep {$_ =~ /^\Q$to_match:/} @amplicon_exon){
        print "$matched\n";
    }        
}

【讨论】：

我更改了正则表达式，以便对 $to_match 中的元字符进行转义。更多详情请见perldoc.perl.org/perlre.html#Quoting-metacharacters