【问题标题】:perl read file from specified line thru the endperl 从指定行到结尾读取文件
【发布时间】:2017-09-12 15:33:09
【问题描述】:

我是 perl 新手。我正在尝试读取一个大的逗号分隔文件,拆分并只抓取一些列。我可以通过一些互联网帮助创建它,但我正在努力更改代码以从特定行开始读取文件末尾。 我的需要是打开文件从第 12 行开始读取,拆分 ',' 获取第 0,2,10,11 列并将这些需要的列与 '\t' 连接起来。

这是我的代码

#!/usr/bin/perl
my $filename = 'file_to_read.csv';
open(FILER, $filename) or die "Could not read $filename.";
open(FILEW, ">$filename.txt")     || die "couldn't create the file\n";
while(<FILER>) {
  chomp;
  my @fields = split(',', $_);
  print FILEW "$fields[0]\t$fields[3]\t$fields[10]\t$fields[11]\n";
}
close FILER;
close FILEW;

这是文件示例:

[Header]
GSGT Version: X
Processing Date:12/01/2010 7:20 PM
Content:
Num SNPs:
Total SNPs:
Num Samples:
Total Samples:
Sample:
[Data]

SNP Name,Chromosome,Pos,GC Score,Theta,R,X,Y,X Raw,Y Raw,B Allele Freq,Log R Ratio,Allele1 - TOP,Allele2 - TOP
1:10001102-G-T,1,10001102,0.4159,0.007,0.477,0.472,0.005,6281,126,0.0000,-0.2581,A,A
1:100011159-T-G,1,100011159,0.4259,0.972,0.859,0.036,0.822,807,3648,0.9942,-0.0304,C,C
1:10002775-GA,1,10002775,0.4234,0.977,1.271,0.043,1.228,809,5140,0.9892,0.0111,G,G

【问题讨论】:

标签: perl


【解决方案1】:

与其跳到特定的行号(可能因文件而异),最好跟踪文件中由[Header][Data] 等标记的当前部分。

此解决方案保留一个状态变量$section,每次在文件中遇到[Section] 标签时,它都会更新为当前部分名称。 Data 部分的所有内容都经过汇总和打印

可以对列标题进行类似的操作,使用名称而不是数字来选择要输出的字段,但我选择降低复杂性

use strict;
use warnings 'all';
use feature 'say';

my $filename = 'file_to_read.csv';

open my $fh, '<', $filename or die qq{Unable to open "$filename" for input: $!};

my $section = "";

while ( <$fh> ) {

    next unless /\S/;            # Skip empty lines

    if ( $section eq 'Data' ) {  # Skip unless we're in the [Data] section
        chomp;
        my @fields = split /,/;
        say join ',', @fields[0,3,10,11];
    }
    elsif ( /\[(\w+)\]/ ) {
        $section = $1;
    }
}

输出

SNP Name,GC Score,B Allele Freq,Log R Ratio
1:10001102-G-T,0.4159,0.0000,-0.2581
1:100011159-T-G,0.4259,0.9942,-0.0304
1:10002775-GA,0.4234,0.9892,0.0111

【讨论】:

    【解决方案2】:

    请分配一个变量来计算处理的行数,如my $line_count = 0;

    在while循环的开头增加变量$line_count++;

    如果行数低于 12 则跳过,即next if $line_count &gt; 12;

    【讨论】:

      猜你喜欢
      • 2023-03-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-07-25
      • 1970-01-01
      • 2019-05-10
      相关资源
      最近更新 更多