【发布时间】:2021-07-01 21:59:17
【问题描述】:
我正在尝试从 EnviroCanada 天气页面中提取以下内容。
我正在尝试按照以下方式获取每个小时。
时间 |大腿 |特洛 |湿度
7:00 | 23 | 22.9 | 30
提取的 HTML 页面:
<tr>
<td headers="header1" class="text-center vertical-center"> 7:00 </td>
<td headers="header2" class="media vertical-center"><span class="pull-left"><img class="media-object" height="35" width="35" src="/weathericons/small/02.png" /></span><div class="visible-xs visible-sm">
<br />
<br />
</div>
<div class="media-body">
<p>Partly Cloudy</p>
</div>
</td>
<td headers="header3m" class=" metricData text-center vertical-center">23
�(22.9)
</td>
<td headers="header3i" class=" imperialData hidden text-center vertical-center">73
�(73.2)
</td>
<td headers="header4m" class="metricData text-center vertical-center">
<abbr title="West-Northwest">WNW</abbr> 8</td>
<td headers="header4i" class="imperialData hidden text-center vertical-center">
<abbr title="West-Northwest">WNW</abbr> 5</td>
<td headers="header6" class="metricData text-center vertical-center">30</td>
<td headers="header6" class="imperialData hidden text-center vertical-center">87</td>
<td headers="header7" class="text-center vertical-center">83</td>
<td headers="header8" class="metricData text-center vertical-center">20</td>
<td headers="header8" class="imperialData hidden text-center vertical-center">68</td>
<td headers="header9m" class="metricData text-center vertical-center">100.7</td>
<td headers="header9i" class="imperialData hidden text-center vertical-center">29.7</td>
<td headers="header10" class="metricData text-center vertical-center">24</td>
<td headers="header10" class="imperialData hidden text-center vertical-center">15</td>
</tr>
到目前为止的代码:
use strict;
use warnings;
use LWP::Simple;
use HTML::TokeParser;
my $url = "http://weather.gc.ca/past_conditions/index_e.html?station=yyz";
my $page = get($url) ||
die "Could not load URL\n";
my $parser = HTML::TokeParser->new(\$page) ||
die "Parse error\n";
$parser->get_tag("td") foreach ();
$parser->get_tag("");
my $time = $parser->get_text();
??
my $thigh = $parser->get_text();
???
my $tlow = $parser->get_text();
???
my $humid = $parser->get_text();
我完全迷路了
【问题讨论】:
-
我喜欢 Mojo::DOM 从 HTML 页面中提取内容,非常好用。