使用 DOM 爬虫从 url 获取元标记

【问题标题】：Get meta tags from url with DOM crawler使用 DOM 爬虫从 url 获取元标记
【发布时间】：2023-03-10 23:48:01
【问题描述】：

我在我的项目中安装了symfony/dom-crawler。我正在尝试从某个随机站点的 URL 中获取一些元标记进行测试。

$url = 'https://www.lala.rs/fun/this-news';

$crawler = new Crawler($url);

$data = $crawler->filterXpath("//meta[@name='description']")->extract(array('content'));

它总是返回[] 作为结果。

我尝试过使用基本的元描述，但也许我理解不正确。我检查了Symfony documentation，但找不到合适的方法。

【问题讨论】：

标签： php symfony dom meta-tags domcrawler

【解决方案1】：

您需要将 HTML 内容传递给 new Crawler($html) 而不是 URL。

由于缺少description，因此使用viewport 在此页面上运行良好。

<meta name="viewport" content="width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0">

$url = 'https://stackoverflow.com/questions/66494027/get-meta-tags-from-url-with-dom-crawler';
$html = file_get_contents($url);
$crawler = new Crawler($html);

$data = $crawler->filterXpath("//meta[@name='viewport']")->extract(['content']);

这给了

Array
(
    [0] => width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0
)

【讨论】：