perl - 从文件中获取列名答案

【问题标题】：perl - fetch column names from fileperl - 从文件中获取列名
【发布时间】：2017-08-07 08:55:58
【问题描述】：

我的 perl 脚本中有以下命令：

my @files = `find $basedir/ -type f -iname '$sampleid*.summary.csv'`; #there are multiple summary.csv files in my basedir. I store them in an array
my $summary = `tail -n 1 $files[0]`; #Each summary.csv contains a header line and a line with data. I fetch here the last line.
chomp($summary);
my @sp = split(/,/,$summary); # I split based on ','
my $gender = $sp[11]; # the values from column 11 are stored in $gender
my $qc = $sp[2]; # the values from column 2 are stored in $gender

现在，我遇到了我的 *summary.csv 文件的列数不同的情况。它们都有 2 行，其中第一行代表标题。

我现在想要的不是将第 11 列中的值存储在性别中，而是我想将“性别”列中的值存储在 $gender 中。

我怎样才能做到这一点？

首先尝试解决方案：

my %hash = ();
my $header = `head -n 1 $files[0]`; #reading the header
chomp ($header);
my @colnames = split (/,/,$header);
my $keyfield = $colnames[#here should be the column with the name 'Gender']
push @{ $hash{$keyfield} };
my $gender = $sp[$keyfield]

【问题讨论】：

你必须获取标题行和你想要的行。然后拆分两者，并以标头作为键构建哈希，然后读取所需键的值。您的代码看起来更像是用 Perl 编写的 shell 脚本，而不是 Perl 程序。为什么不直接使用 Perl 打开和读取文件，或者直接使用 Text::CSV？
@simbabque：我正在改编一个我自己没有从头开始编写的现有脚本。我已经在问题中添加了第一次尝试解决方案。但是，我坚持“以标头为键构建哈希”。
你没有使用哈希。
查看我的更新答案。
@simbabque：是的，我做到了。我接受了你的回答。

标签： perl

【解决方案1】：

您必须阅读标题行以及数据才能知道哪一列包含哪些信息。这通过编写实际的 Perl 代码而不是使用各种命令行实用程序来完成是最简单的。有关该解决方案，请参见下文。

修复您的解决方案还需要一个哈希值。您需要先读取标题行，将标题字段存储在数组中（正如您已经完成的那样），然后读取数据行。数据需要是散列，而不是数组。哈希是键和值的映射。

# read the header and create a list of header fields
my $header = `head -n 1 $files[0]`;
chomp ($header);
my @colnames = split (/,/,$header);

# read the data line
my $summary = `tail -n 1 $files[0]`;
chomp($summary);

my %sp; # use a hash for the data, not an array

# use a hash slice to fill in the columns
@sp{@colnames} = split(/,/,$summary);

my $gender = $sp{Gender};

这里棘手的部分是这一行。

@sp{@colnames} = split(/,/,$summary);

我们已将 %sp 声明为哈希，但现在我们使用 @ sigil 访问它。那是因为我们正在使用a hash slice，如花括号{} 所示。我们采用的切片是所有具有@colnames 中值名称的元素。值不止一个，因此返回值不再是标量（带有$）。有一个返回值列表，因此印记变为@。现在我们使用左侧的列表（称为LVALUE），并将split 的结果分配给该列表。

用现代 Perl 来做

以下程序将使用 File::Find::Rule 替换您的 find 命令，并使用 Text::CSV 读取 CSV 文件。它抓取所有文件，然后一次打开一个。标题行将首先被读取，并输入到 Text::CSV 对象中，然后它可以返回一个哈希引用，您可以使用它来按名称访问每个字段。

我已经以一种方式编写它，它只会为每个文件读取一行，正如你所说的每个文件只有两行。您可以轻松地将其扩展为循环。

use strict;
use warnings;
use File::Find::Rule;
use Text::CSV;

my $sampleid;
my $basedir;

my $csv = Text::CSV->new(
    {
        binary => 1,
        sep    => ',',
    }
) or die "Cannot use CSV: " . Text::CSV->error_diag;

my @files = File::Find::Rule->file()->name("$sampleid*.summary.csv")->in($basedir);

foreach my $file (@files) {
    open my $fh, '<', $file or die "Can't open $file: $!";

    # get the headers
    my @cols = @{ $csv->getline($fh) };
    $csv->column_names(@cols);

    # read the first line
    my $row = $csv->getline_hr($fh);

    # do whatever you you want with the row
    print "$file: ", $row->{gender};
}

请注意，我没有测试过这个程序。

【讨论】：