【问题标题】:Formatting parsed output perl格式化解析的输出 perl
【发布时间】:2019-12-15 18:24:02
【问题描述】:

我有这个文件——我只需要文件的最后五行。 我知道我不应该在没有 html 模块的情况下解析 html。但这并不像 一个严格的程序——我的意思是我真正需要的只是最后五行左右。另外我不能下载 任何模块。我确实可以访问允许我从命令行卷曲文件的代理服务器 所以也许有一种方法可以使用 cpan fromteh 或通过代理 - 但这是另一回事。 手头的问题是,当我解析出最后的文件行时,我没有得到 “我部门中受限制的名称” 我想要它。它被跳过了。

new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ cat restricted.html.bak
To:DL-BANK@big_business.com
From:dl-dept?g-gsd-stm@big_business.com
Subject: Restricted List for 25-Nov-2014
Content-Type: text/html;
Content-Transfer-Encoding: quoted-print HTMLFILEable>
<HTML>
  <HEAD>
     <STYLE type="text/css">
     body    { font-family: verdana; font-size: 10pt }
     td      { font-size: 8pt; vertical-align: top }
     td.cat  { color: 6699FF ; background: 666699; text-align: right; vertical-align: bottom; height: 30 }
     td.ind  { width: 20pt }
     td.link { }
     td.desc { color: a0a0a0 }
     a:visited { color: 800080; text-decoration: none }
    </STYLE>
 <TITLE>TRADES</TITLE>
 </HEAD><BODY><TABLE width="80%" border="0" cellpadding="0" cellspacing="0">
      <tr>
         <td colspan="3" align="center">Names IN MY-DEPT that are restricted</td>
      </tr>
      <tr>
        <td><b>Restriction Code</b></td>
        <td><b>Company</b></td>
        <td><b>Ticker</b></td>
     </tr><tr><td>RL5</td><td>First Trust Global Risk Managed Inc</td><td>ETP</td></tr><font color="red"><tr><td>RLMT</td><td>GT Advanced Technologies Inc</td><td nowrap>GTATQ (position only, not in MY-DEPT)</td></tr></font></TABLE></BODY</HTML>new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ cat parse_restrict2
#!/usr/bin/perl
use strict;
use warnings ;
my @restrict_codes = qw(RL3 RL5 RL5H RL6 REGM RAF RLMT RTCA RTCAH RTCB RTCBH RTCI RTCIH RLSI RLHK RLJP RPROP RLCB RLCS RLBZ RLBZH RLSUS);
my $rest_dir = "/home/new_guy/hey/hit_BANK_restricted./";
my $restrict_file = "restricted.html.bak" ;
open my $fh_rest_codes, '<', "$rest_dir$restrict_file" or die "cannot load $! " ;
while (<$fh_rest_codes>) {
    next unless $_ =~ m/Names/;
    my @lines = <$fh_rest_codes> ;
    }
foreach(@lines) {
    s/td/ /g ;
    s/<[^>]*>/ /g ;
    foreach $restrict(@restrict_codes) {
        s/$restrict/\n$restrict/g;
        }
    print $_ ;
    sleep 1  ;
    }

print "\n" ;

这些是我得到的结果: 它们没问题,但我想格式化它们,但我不知道如何。

new_gue@casper0170foo:~/hey/hit_BANK_restricted.$ cat parse_restrict^C
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ ./parse_restrict2


          Restriction Code
          Company
          Ticker

RL5  First Trust Global Risk Managed Inc  ETP
RLMT  GT Advanced Technologies Inc  GTATQ (position only, not in MY-DEPT)
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$

有什么办法可以得到这种格式的行。

Names IN MY-DEPT that are restricted

Restriction Code Company                              Ticker
RL5              First Trust Global Risk Managed Inc  ETP
RLMT             GT Advanced Technologies Inc         GTATQ (position only, not in MY-DEPT)

【问题讨论】:

  • 让 CPAN 与代理一起工作并不难。看看o conf http_proxy。使用与 LWP 相同的机制(因为它大量使用 LWP)。

标签: perl format


【解决方案1】:

问得好,如果你愿意,可以试试这个解决方法:

my @lines;
while (<$fh_rest_codes>) {
    next unless $_ =~ m/Names/;
    push(@lines, $_);
    push (@lines, <$fh_rest_codes>);
}
my $str=join ('',@lines);    
$str=~m|<td.*?>(.*?)</td>|;    
print "$1\n\n";
$str=~ m|<tr>(.*?)</tr>|msg;
my $fmt="%-24s%-40s%-40s\n";
printf ($fmt,  $1=~ m{<td><b>(.*?)</b></td>}msg );
while ($str=~ m|<tr>(.*?)</tr>|msg) {
    printf ($fmt,  $1=~ m{<td.*?>(.*?)</td>}msg );
}

输出:

Names IN MY-DEPT that are restricted

Restriction Code        Company                                 Ticker                                  
RL5                     First Trust Global Risk Managed Inc     ETP                                     
RLMT                    GT Advanced Technologies Inc            GTATQ (position only, not in MY-DEPT)   

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-05-13
    • 2016-05-19
    • 1970-01-01
    • 1970-01-01
    • 2015-10-15
    • 2021-01-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多