【问题标题】:Compare arrays of urls and remove from one array urls depend on second array比较 url 数组并从一个数组中删除 url 依赖于第二个数组
【发布时间】:2017-09-10 17:15:21
【问题描述】:

我需要一些帮助来思考我应该做什么。例如,我有两个带有 url 的数组:

$urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

我需要在那里创建三个 url 数组。首先,它们将具有来自两个数组的共同 url。其他两个将是相同的,只是如果在这两个首字母数组中我有相同的 url,但不同之处仅在于 http - https 我只需要将此 url 分配给一个数组。

因此,从我的示例中,我需要通过以下方式获取两个数组:

 $commonUrls = ['https://test.com/']; //becouse i have only this url in two arrays


 $urls = ['http://example.com/', 'https://google.com/'];   //'http://example.com/ I leave in this array this url and remove from second table becouse in second array i have the same- difference is only in https


  $urlsFromOtherSource = ['https://facebook.com/']; //remove from this array https://example.com/ becouse this url is in first array- difference is only in http

我试图思考如何比较这些数组并找出 http-https 中的差异,但这对我来说并不容易。我的代码如下所示:

  $urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

        $commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays
        $urls = array_diff($urls, $commonUrls);//I remove from this array urls which i have in common array
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $commonUrls);//I remove from this array urls which i have in common array


        foreach ($urlsFromOtherSource as $url) {
            $landingPageArray[] = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
        }

        foreach ($urls as $url) {
            $landingPage = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
            if (in_array($landingPage, $landingPageArray)) {
                $httpDifference[] = $url;
            }
        }
        //I havent idea how can I remove from $urlsFromOtherSource urls which I have in $urls array and where difference is only in http-https
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $httpDifference);

所以我只需要比较数组并从我在第一个数组中拥有的第二个数组 url 中删除,这个 url 之间的区别只是 http-htpps。也许有人可以帮我找到一些算法。

更新 如果我在 commonUrls 中有此 URL,我还需要从 urlsFromOtherSource 中删除:

commonUrls: array(1) {
  [0]=>
  string(17) "http://www.test.com/"
}



urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "http://test.com/"
}

所以我需要从 urlsFromOtherSource 中删除这个 URL。并使此代码仅自动比较着陆页,无论它是 http://www 或 www 或仅 http:// 我不需要在我的数组中比较它

【问题讨论】:

  • 嗨,我知道这个功能,但这对我没有帮助。我需要一些算法来帮助我比较相似的 URL 或从一个数组中删除相似的 URL 我认为你没有阅读我的所有帖子,因为这个 URL 不一样我需要检查 http-https 状态
  • 希望对您有所帮助:$commonUrls = array_intersect($urls,$urlsFromOtherSource); $urls = array_diff($urls, $commonUrls); $urlsFromOtherSource=array_diff($urlsFromOtherSource, $commonUrls); $urlsFromOtherSource=array_diff($urlsFromOtherSource, $urls)
  • ...如果您的问题是 'http' 和 'https' 对您来说是相同的,那么您必须像以前一样在每个 URL 上使用 preg_replace,例如:preg_replace('/http:/i', 'https:', $url);
  • arrat_diff 不适用于那个因为我有 example.comexample.com

标签: php arrays url compare


【解决方案1】:

您可以使用 u 方法编写自己的比较函数,例如 array_udiffarray_uintersect。比较url时使用preg_replace忽略与http/https的区别。

$commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays

$urls = array_diff($urls, $commonUrls);

$urlsFromOtherSource = array_udiff(array_diff($urlsFromOtherSource, $commonUrls), $urls, function ($a, $b) {
  return strcmp(preg_replace('|^https?://(www\\.)?|', '', $a), preg_replace('|^https?://(www\\.)?|', '', $b));
});

这会产生:

commonUrls: array(1) {
  [0]=>
  string(17) "https://test.com/"
}

urls: array(2) {
  [1]=>
  string(19) "http://example.com/"
  [2]=>
  string(19) "https://google.com/"
}

urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "https://facebook.com/"
}

【讨论】:

  • 但是请在我的第一篇文章中更新后帮助我处理我的代码我需要删除“www”我也尝试与您的代码结合但我失败了。
  • 上次你能帮我解决这个问题吗?我总是需要检查 http 和 www 并且不要在任何数组中重复 URL
  • 我更新了我的答案,所以它现在在比较时忽略了http/httpswww.
猜你喜欢
  • 2016-12-19
  • 2019-06-12
  • 2017-08-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-01-11
  • 1970-01-01
  • 2020-02-01
相关资源
最近更新 更多