读取目录中的多个文件并与另一个文件进行比较答案

【问题标题】：Read multiple files in a directory and compare with another file读取目录中的多个文件并与另一个文件进行比较
【发布时间】：2012-08-31 23:11:55
【问题描述】：

我有两个文件

File 1 in reading directory is of following format 

Read 1 A T
Read 3 T C
Read 5 G T
Read 7 A G
Read 10 A G
Read 12 C G

File 2 in directory contains

    Read 5 A G
    Read 6 T C
    Read 7 G A
    Read 8 G A
    Read 20 A T

文件2包含

我需要先读取文件 2 中的位置，然后以水平方式从目录中打开的文件中打印出相应的值。如果该位置不匹配，则打印为“-”。上面的输出应该是

     1 2 3 4 5 6 7
Read T - C - T - G
Read - - - - G C A

我需要对所有文件执行此操作，并在另一行中以上述格式打印。所以输出将只有一个文件，行数等于文件数。我可以在 perl 中轻松做到这一点吗？

【问题讨论】：

这应该像家庭作业 - 请展示你已经尝试过的东西，人们更有可能帮助你
这绝不是家庭作业。我是生物信息学的新手，并试图在工作场所学习语言。我也不是在寻找完整的解决方案。我只需要一个方向。我正在读取第二个带有位置的文件，然后打开目录并一个一个打开文件并将内容推送到数组。然后之后我不知道如何将位置与打开的文件进行比较。还有如何水平打印它们？
您有三个输入文件，其中两个称为“File2”。输入如何映射到所需的输出并不明显。例如，为什么第 1 列包含“-T”（或者我什至没有正确阅读它）？（第一个）File2 是否总是覆盖来自 File1 的输入，就像它出现在输出的第 5 列中一样？请编辑您的问题。
（哎呀，我的意思是第 5 列观察反之亦然。）

标签： linux perl unix operating-system

【解决方案1】：

据我所知，您只使用了第二个数据列。这是一个简单的 perl 程序，如果有任何问题，请随时提问。我使用了第三个输入文件，可以使用任意数量的文件。我将格式更改为在末尾包含42。

代码：

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

# try to open format file
my $ffn = shift @ARGV or die "you didn't provide a format file name!\n";
open my $ffh, '<', $ffn;

# read format file
my @format = <$ffh>;
close $ffh;
chomp for @format; # get rid of newlines

# prepare output
print '     ' . join(' ' => @format) . "\n";

# iterate over all .txt files in the data directory
foreach my $data_fn (<data/*.txt>) {

    # iterate over all lines of the data file
    open my $data_fh, '<', $data_fn;
    my %data = ();
    foreach my $line (<$data_fh>) {

        # parse input lines (only)
        next unless $line =~ /Read (\d+) ([ACGT]) ([ACGT])/;
        my ($pos, $first, $second) = ($1, $2, $3);

        # store data
        $data{$pos} = $second;
    }

    # print summary
    print 'Read ' . join(' ' => map {$data{$_} // '-'} @format) . "\n";
}

输出：

$ perl bio.pl format.txt
     1 2 3 4 5 6 7 42
Read T - C - T - G -
Read - - - - G C A -
Read - C - T - - - A

HTH！ :)

【讨论】：

【解决方案2】：

如果文件很小，您可以将它们读入内存：

#read input files
use IO::File;
my $file1_data;
open(my $file1_fh,"<","/path/file1.data") or die $!;
#read file1
while(my $line=<$file1_fh>){
  chomp($line);
  my ($read,$pos,$col1,$col2) = split(/ /,$line);
  $file1_data->{$pos} = [$col1,$col2];
}
#read file2
my $file2_data;
open(my $file2_fh,"<","/path/file2.data") or die $!;
while(my $line=<$file2_fh>){
  chomp($line);
  my ($read,$pos,$col1,$col2) = split(/ /,$line);
  $file2_data->{$pos} = [$col1,$col2];
}
#read pos file
my @positions;
while(my $pos=<$posfile_fh>){
  chomp($pos);  
  push(@positions,$pos)
}
foreach my $pos (@positions){
    print "$pos\t";
}
print "\n";
foreach my $pos (@positions){
    my $data = defined $file1_data->{$pos}->[0]?$file1_data->{$pos}->[0]:"-";
    print "$pos\t$data"
}
print "\n";
foreach my $pos (@positions){
    my $data = defined $file2_data->{$pos}->[0]?$file2_data->{$pos}->[0]:"-";
    print "$pos\t$data"
}
print "\n";

【讨论】：