我想使用 PHP 脚本创建爬虫答案

【问题标题】：I want to create a crawler using PHP script我想使用 PHP 脚本创建爬虫
【发布时间】：2019-04-09 05:56:36
【问题描述】：

我想为网站创建一个 PHP 脚本。我只想从该链接中找出链接。例如我有http://example.com 链接，我的爬虫应该在后台打开该链接并找到与http://example.com/[any 名称]/reviews 匹配的所有链接。我尝试了正则表达式但无法正常工作，任何人都可以帮助我。

<?php
$url="https://clutch.co/it-services";
$contents =file_get_contents($url);
$pattern = "https://clutch.co/profile/".'/^[a-zA-Z ]*$/'."#review";
$pattern = preg_quote($pattern, '/');
if(preg_match_all($pattern, $contents, $matches)){
   echo "Found matches:\n";
   foreach ($matches[0] as $urls) {
    echo $urls;
  }
}
else{
   echo "No matches found";
}
?>

【问题讨论】：

您到底尝试了什么？展示你的尝试。
如果改用/[any name]/reviews会怎样？

标签： php web-crawler

【解决方案1】：

正则表达式模式存在一些语法问题：

分隔符 / 需要在模式之外，并且该模式 ("https://") 内的分隔符和特殊字符 (.) 需要转义 ("https:\/\/")

所以模式应该是：

/https:\/\/clutch\.co\/profile\/[a-zA-Z ]*#review/

一个正则表达式小提琴：https://regex101.com/r/OEUQOU/1

【讨论】：