打开最新的日志文件并打印晚于某个时间戳的行答案

【问题标题】：Open the latest log file and print lines later than a certain timestamp打开最新的日志文件并打印晚于某个时间戳的行
【发布时间】：2012-07-13 15:25:18
【问题描述】：

我正在编写一个 Perl 脚本，我需要从垃圾收集日志中捕获一些行并将它们写入文件。

日志位于远程主机上，我正在使用Net::OpenSSH 模块进行连接。

我需要阅读最新的可用日志文件。

在 shell 中，我可以使用以下命令找到最新的日志：

cd builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin
ls -lat | grep '.log$' | tail -1

这将返回最新的日志：

-rw-r--r--   1 load     other    2406173 Jul 11 11:53 18156.stdout.log

所以在 Perl 中，我希望能够编写一些东西来定位并打开该日志以供阅读。

当我拥有该日志文件时，我想打印时间戳大于指定时间的所有行。指定的时间戳是从最新的日志消息时间中减去的$Runtime 变量。

这是垃圾收集日志的最后一条消息：

                                      ...

73868.629: [GC [PSYoungGen: 941984K->14720K(985216K)] 2118109K->1191269K(3065984K), 0.2593295 secs] [Times: user=0.62 sys=0.00, real=0.26 secs]
73873.053: [GC [PSYoungGen: 945582K->12162K(989248K)] 2122231K->1189934K(3070016K), 0.2329005 secs] [Times: user=0.60 sys=0.01, real=0.23 secs]

因此，如果 $Runtime 的值为 120 秒，我需要打印从时间戳 (73873.053 - 120) 秒开始的所有行。

最后我的脚本看起来像这样......

open GARB, ">", "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";

my $ssh2 = Net::OpenSSH->(
  $pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error; 

# Something to find and open the log file.
print GARB #Something to return certain lines.
close GARB;

我意识到这有点类似于this 问题，但我想不出一种方法来定制它以适应我正在寻找的内容。非常感谢任何帮助！

【问题讨论】：

标签： perl logging ssh garbage-collection

【解决方案1】：

找到最新的文件并将其提供给 perl：

 LOGFILE=`ls -t1 $DIR | grep '.log$' | head -1`
 if [ -z $LOGFILE ]; then
   echo "$0: No log file found - exiting"
   exit 1;
 fi

 perl myscript.pl $LOGFILE

第一行中的管道列出了目录中的文件，仅名称，在一列中，最近的在前；过滤日志文件，然后只返回第一个。

我不知道如何将您的时间戳翻译成我可以理解的东西，并可以进行数学和比较。但总的来说：

$threshold_ts = $time_specified - $offset;
while (<>) {
  my ($line_ts) = split(/\s/, $_, 2);
  print if compare_time_stamps($line_ts, $threshold_ts);
}

编写阈值操作和比较作为练习留给读者。

【讨论】：

我认为您已经忽略了 SSH 部分。日志文件位于远程计算机上。
一点也不。我喜欢你的回答，它写得很好，比我的解释更多，所以我+1。 :)

【解决方案2】：

我认为Net::OpenSSH 的页面为此提供了一个很好的基线：

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

while (<$rout>) { print }
close $rout;

但相反，你想做一些丢弃工作：

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

my $line;
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) < $start 
      ) {}
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) <= $start + $duration 
      ) {
    print $line;
}
close $rout;

【讨论】：

【解决方案3】：

这是一种未经测试的方法。我没有使用过Net::OpenSSH，所以可能有更好的方法来做到这一点。我什至不确定它是否有效。起作用的是我测试过的解析部分。

use strict; use warnings;
use Net::OpenSSH;

my $Runtime = 120;
my $now = time;
open my $garb, '>', 
  "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";
my $ssh2 = Net::OpenSSH->(
$pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error;   

# Something to find and open the log file.
my $fileCapture = $ssh2->capture(
  q~ls -lat builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin |grep '.log$' |tail -1~
);
$fileCapture =~ m/\s(.+?)$/; # Look for the file name
my $filename = $1;           # And save it in $filename

# Find the time of the last log line 
my $latestTimeCapture = $ssh2->capture(
  "tail -n 1 builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
$latestTimeCapture =~ m/^([\d\.]+):/;
my $logTime = $1 - $Runtime;

my ($in, $out, $pid) = $ssh2->open2(
  "builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
while (<$in>) {
  # Something to return certain lines.
  if (m/^([\d\.]+):/ && $1 > $logTime) {
    print $garb $_; # Assume the \n is still in there
  }
}

waitpid($pid);

print $garb;
close $garb;

它使用ls 行通过capture 方法查找文件。然后它通过 SSH 隧道打开一个管道来读取该文件。 $in 是我们可以读取的那个管道的文件句柄。

由于我们要逐行处理文件，从顶部开始，我们需要先抓取最后一行以获得最后一个时间戳。这是通过tail 和capture 方法完成的。

一旦我们有了它，我们就逐行从管道中读取。现在这是一个简单的正则表达式（与上面使用的相同）。获取时间戳并将其与我们之前设置的时间（减去 120 秒）进行比较。如果更高，print 输出文件句柄的行。

docs 表示我们必须在从$ssh2->open2 返回的$pid 上使用waitpid，以便它获取子进程，因此我们在关闭输出文件之前执行此操作。

【讨论】：

我最终选择了你的大部分答案。尽管需要对正则表达式进行一些调整，但我决定也加入一个变体 Axeman 的答案。谢谢！

【解决方案4】：

您需要保留一个包含所有行的累加器（更多内存）或多次遍历日志（更多时间）。

使用累加器：

my @accumulated_lines;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        for my $previous_line (@accumulated_lines) {
            my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
            next unless $previous_timestamp <= $current_timestamp;
            next unless $previous_timestamp >= $start_timestamp;
            print $previous_line;
        }
    }
}

或者您可以遍历日志两次，这与此类似，但没有嵌套循环。我假设您的日志中可能有多个这些跨度。

my @report_spans;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        push @report_spans, [ $start_timestamp, $current_timestamp ];
    }
}

# Don't bother continuing if there's nothing to report
exit 0 unless @report_spans;

# Start over
seek $log_fh, 0, 0;

while (<$log_fh>) {
    my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
    SPAN: for my $span (@report_spans) {
        my ($start_timestamp, $current_timestamp) = @$span;

        next unless $previous_timestamp <= $current_timestamp;
        next unless $previous_timestamp >= $start_timestamp;
        print; # same as print $_;

        last SPAN; # don't print out the line more than once, if that's even possible
    }
}

如果您可能有重叠的跨度，后者的优点是不会两次显示相同的日志行。如果您没有重叠跨度，您可以通过在每次输出时重置累加器来优化顶部：

my @accumulator = ();

这样可以节省内存。

【讨论】：

【解决方案5】：

使用 SFTP 访问远程文件系统。您可以使用Net::SFTP::Foreign（单独或通过 Net::OpenSSH）。

它将允许您列出远程文件系统的内容，选择您要处理的文件，打开它并将其作为本地文件进行操作。

您需要做的唯一棘手的事情是向后读取行，例如从末尾开始读取文件的块并将它们分成几行。

【讨论】：