【问题标题】:Perl file reading have issue while splitting it as spacePerl 文件读取在将其拆分为空间时出现问题
【发布时间】:2019-11-26 08:06:08
【问题描述】:

我正在从输入文件中读取数据,如果行中有Date,则需要取出整行并进行处理。在我的示例中,Mem-Id 是唯一值,我想创建一个以 Mem-Id 为键的哈希。根据这里的数据是每个字段的等效值

Id -> 1
Mem-Id -> 1
Date & Time (+00:00) -> 2018-07-30T07:40:23
Priority -> LOW
Main Affected objects -> val/s1 val/s0;
Text -> Temperature exceded the limit

这是我的代码:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my @data = <DATA> ;

foreach my $data_line ( @data ){
    chomp $data_line;
    if( $data_line =~ m/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}/){
        my ($id, $mem_id, $date_time, $priority, $affected_obj, $text) = split(/\s+/, $data_line);
        print "$id, $mem_id, $date_time, $priority, $affected_obj, $text\n";
    }
}

__DATA__
.............
.............
.............
========================================================
Id Mem-Id Date & Time (+00:00) Priority Main Affected objects Text
========================================================
1 1 2018-07-30T07:40:23 LOW val/s1 val/s0; Temperature exceded the limit
======================================================== 
............
............

当我执行上述脚本时,我得到以下错误的输出:

1, 1, 2018-07-30T07:40:23, LOW, val/s1, val/s0;

由于Main Affected objects 的值中有一个空格,因此将其作为单独的值分配给$affected_obj$text 变量。

如何在用空格分割数据行时为$affected_obj$text 赋值。

Main Affected objects = val/s1 val/s0;
Text = Temperature exceded the limit

【问题讨论】:

  • @zdim Main Affected objectsText 在其字段值中始终包含空格。
  • 又如何分开?
  • @choroba 空格或分号。
  • 如果用空格隔开,又包含空格,如何识别边框在哪里?
  • @choroba 我知道Main Affected objectsText会用分号(;)隔开。

标签: perl file-io split


【解决方案1】:

我非常喜欢让事情尽可能简单。我认为你可以通过两次调用 split() 来做到这一点。

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

while (<DATA>) {
  # Split 1: split the text column off by looking for the semi-colon
  my ($rest, $text) = split /;\s*/;
  # Split 2: split the rest of the data on whitespace. But use a split 
  # limit (5) to stop the affected objects from being split apart.
  my ($id, $mem_id, $datetime, $priority, $affected) = split /\s+/, $rest, 5;

  say join ' | ', $id, $mem_id, $datetime, $priority, $affected, $text;
}

__DATA__
1 1 2018-07-30T07:40:23 LOW val/s1 val/s0; Temperature exceded the limit

输出:

1 | 1 | 2018-07-30T07:40:23 | LOW | val/s1 val/s0 | Temperature exceded the limit

【讨论】:

  • 谢谢@Dave。这是最简单的逻辑。非常感谢。
【解决方案2】:

如有疑问,请使用正则表达式。
-- 本尼迪克特九世

my @data = <DATA> ;

my $matcher = qr/
    ^ (?<id>(?&token_id))            (?&splitter) 
      (?<mem_id>(?&token_id))        (?&splitter)
      (?<date>(?&token_date))        (?&splitter)     
      (?<priority>(?&token_prio))    (?&splitter)     
      (?<affected>(?&token_objects)) (?&splitter)     
      (?<text>(?&token_rest_of_line))

    (?(DEFINE)
        (?<splitter>   \x20        )   # blank
        (?<token_id>   \d++        )         
        (?<token_date> [0-9]{4} - [0-9]{2} - [0-9]{2} T [0-9]{2} : [0-9]{2} : [0-9]{2} )
        (?<token_prio> HI|LOW )
        (?<token_objects> [^;]++ ; ) # you can get more complex here if needed
        (?<token_rest_of_line> .+ $ )
    )
/x;

foreach my $data_line ( @data ){
    chomp $data_line;
    if( $data_line =~ $matcher ) {
        print Dumper( \%+ );
        # $VAR1 = {
        #   'affected' => 'val/s1 val/s0;',
        #   'priority' => 'LOW',
        #   'mem_id' => '1',
        #   'id' => '1',
        #   'date' => '2018-07-30T07:40:23',
        #   'text' => 'Temperature exceded the limit'
        # };
    }
}

__DATA__
.............
.............
.............
========================================================
Id Mem-Id Date & Time (+00:00) Priority Main Affected objects Text
========================================================
1 1 2018-07-30T07:40:23 LOW val/s1 val/s0; Temperature exceded the limit
======================================================== 
............
............

编辑:

更多信息请参考perlretut,尤其是。关于named capturesnamed patterns的部分。

如果感兴趣的话:Damian the Great 解释为什么everything you knew about regular expressions is wrong

【讨论】:

  • 这个答案很有帮助。您能否详细说明一下qr/.../x; 中的表达式是什么意思,以及在打印转储程序时 (\%+) 是什么意思。
猜你喜欢
  • 2023-03-29
  • 1970-01-01
  • 1970-01-01
  • 2016-08-01
  • 2016-04-11
  • 2011-10-29
  • 2012-06-06
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多