如何使用 Perl 在文件中的两个时间戳之间搜索行？答案

【问题标题】：How to search for lines in a file between two timestamps using Perl?如何使用 Perl 在文件中的两个时间戳之间搜索行？
【发布时间】：2011-03-09 06:31:46
【问题描述】：

在 Perl 中，我试图读取一个日志文件，并且将仅打印在两个特定时间之间具有时间戳的行。时间格式为 hh:mm:ss，这始终是每个日志上的第三个值。例如，我将搜索介于 12:52:33 到 12:59:33 之间的行

我是 Perl 的新手，甚至不知道该采取哪条路线来开始编程。我很确定这将使用某种类型的正则表达式，但对于我的生活，我什至无法开始理解那会是什么。有人可以帮我解决这个问题吗？

另外，为了让这更困难，我必须使用核心 Perl 模块，因为我的公司不允许我使用任何其他模块，直到它们经过测试和验证不会对任何系统产生不良影响脚本可能会与之交互。

【问题讨论】：

第三个值是什么意思？第三个字段？
你能从日志中发布一个例子吗？这将有助于解决问题。

标签： regex perl timestamp

【解决方案1】：

在伪代码中，你会做这样的事情：

逐行读取文件：
- 解析此行的时间戳。
- 如果小于开始时间，跳到下一行。
- 如果大于结束时间，跳到下一行！
- 否则：这是您想要的一行：打印出来。

这对于您的需求来说可能太高级了，但是 flip-flop operator .. 会立即浮现在脑海中，因为它在这里很有用。

对于从标准输入读取文件，这是常规模式：

while (my $line = <>)
{
     # do stuff...
}

使用split 可以轻松地将行解析为字段（请参阅perldoc -f split）。根据格式，您可能需要用制表符或空格分隔行。

获得特定字段（包含时间戳）后，您可以使用自定义的正则表达式对其进行检查。在perldoc perlre 阅读有关这些内容的信息。

这里有一些可能会让你更接近：

use strict;
use warnings;

use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);

while (my $line = <>)
{
    # split into fields using whitespace as the delimiter
    my @fields = split(/\s+/, $line);

    # the timestamp is the 3rd field
    my $timestamp = $fields[2];

    my ($hour, $min, $sec) = split(':', $timestamp);
    my $time = mktime($sec, $min, $hour);

    next unless ($time < $starttime) .. ($time > $endtime);
    print $line;
}

【讨论】：

如果你想要 O(logN) 而不是 O(N)，你可以使用二进制搜索而不是读取每一行（假设日志文件按时间戳排序）。
这样的任务非常适合触发器操作员。

【解决方案2】：

如果开始和结束时间已知，那么您需要一个带有触发器运算符的 Perl 单行代码：

perl -ne 'print if /12:52:33/../12:59:33/' logFile

如果需要一些基本逻辑来确定开始和结束时间，则将单行“展开”为正式脚本：

use strict;
use warnings;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    print if /$startTime/../$endTime/;
}

正如 Ether 的评论所指出的，如果不存在确切的时间，这将失败。如果这是一种可能性，则可以改为实现以下逻辑：

use strict;
use warnings;
use autosplit;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    my $time = (split /,/, $_)[2];      # Assuming fields are comma-separated
                                        # and timelog is 3rd field

    last  if $time gt $endTime;         # Stop when stop time reached
    print if $time ge $startTime;
}

【讨论】：

如果没有行的时间戳与开始或结束时间完全匹配，则条件将失败。
@Ether：同意。当 OP 未指定有关问题的足够信息时，就会发生这种情况。

【解决方案3】：

如果文件中的每一行都有时间戳，那么在'sed'中你可以这样写：

sed -n '/12:52:33/,/12:59:33/p' logfile

这将呼应相关行。

有一个 Perl 程序 s2p，它将“sed”脚本转换为 Perl。

基本的 Perl 结构是这样的：

my $atfirst = 0;
my $atend = 0;
while (<>)
{
    last if $atend;
    $atfirst = 1 if m/12:52:33/;
    $atend = 1 if m/12:59:33/;
    if ($atfirst)
    {
        process line as required
    }
}

请注意，在编写时，代码将处理与结束标记匹配的第一行。如果您不希望这样，请在测试后移动“最后一个”。

【讨论】：

【解决方案4】：

如果您的日志文件按天分隔，您可以将时间戳转换为秒并进行比较。（如果没有，请使用my answer to a question you asked earlier 中的技术。）

说你的日志是

12:52:32 外面
12:52:43 严格在里面
12:59:33 结束
12:59:34 外面

然后用

#! /usr/bin/perl

use warnings;
use strict;

my $LOGPATH = "/tmp/foo.log";

sub usage { "Usage: $0 start-time end-time\n" }

sub to_seconds {
  my($h,$m,$s) = split /:/, $_[0];
  $h * 60 * 60 +
       $m * 60 +
            $s;
}

die usage unless @ARGV == 2;
my($start,$end) = map to_seconds($_), @ARGV;

open my $log, "<", $LOGPATH or die "$0: open $LOGPATH: $!";
while (<$log>) {
  if (/^(\d+:\d+:\d+)\s+/) {
    my $time = to_seconds $1;
    print if $time >= $start && $time <= $end;
  }
  else {
    warn "$0: $LOGPATH:$.: no timestamp!\n";
  }
}

你会得到以下输出：

$ ./12:52:33 12:59:33 之间
12:52:43 严格在里面
12:59:33 结束

【讨论】：