【问题标题】:Using regex function in a while loop在while循环中使用正则表达式函数
【发布时间】:2016-01-22 11:23:24
【问题描述】:

我有一个从特定网站获取特定链接的函数,它可以工作,但是当我尝试在 while 循环中使用此函数时,问题就开始了。当我尝试这样做时,由于某种原因,链接长度开始增加。

function getLinks($link) {

$link1 = $link;
$content = file_get_contents($link1);

$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);

preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];

echo $link1."<br>";
echo $link2."<br>";

return $link2;

}


$firstLink = getLinks("https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages");

结果 firstLink = getLinks():

https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages

^--- 看看它是如何正常工作的?然后当我把它放在一个while循环中时:

$count = 0; 
while ($count < 5) {

$count++;
$firstLink = getLinks($firstLink);

}

结果完全一团糟,链接开始相互堆叠,如下所示:

https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages

这让我发疯了,所以如果有人知道我做错了什么,请告诉我。谢谢。

while 循环中的常规函数​​:

function addOne($num) {

echo $num."<br>";   
$num++;
return $num;    

}

$num = 0;
$count = 0;
while ($count < 5) {

$count++;
$num = addOne($num);    

}

^---工作得很好

【问题讨论】:

    标签: php regex hyperlink while-loop preg-match


    【解决方案1】:

    您的问题在于 HTML 实体。我重新编写了函数来解决该问题,重复 URL 并使其更有效。您使用深度参数调用它,在您的情况下,这将是您的最大时间。

    function getLinks($linkd, $depth, $checked=array()) {
    
    if(!is_array($linkd)) $linkd=array($linkd);
        foreach($linkd as $link)
        {
            if(isset($checked[$link])) continue;
            $link1 = $link;
            $content = file_get_contents($link1);
    
            $content = str_replace("<", "", $content);
            $content = str_replace(">", "", $content);
    
            preg_match("~previous page.+?next page~i", $content, $match);
            preg_match("~\"(/.+?)\"~i", $match[0], $match);
            $link2 = "https://en.wiktionary.org".$match[1];
    
            echo $link1."<br>";
            echo $link2."<br>";
    
            $checked[$link] = true;
    
            if($depth>0)
            {
                $depth--;
                return getLinks(html_entity_decode($link2), $depth, $checked);
            }
            else
            {
                return $link2;
            }
    
        }
    }
    
    
    $firstLink = "https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages";
    
    $firstLink = getLinks($firstLink, 5);
    

    【讨论】:

    • 该网站有一个下一页链接,基本上,我的功能是访问下一页链接,并将下一页的链接返回给变量$firstLink。然后,它使用 $firstLink 在那个页面上获取下一页的链接,并重复 5 次。
    • 所以它得到page1的链接,到page1,得到page2的链接,到page2,得到page3的链接,到page3,如此循环。
    • 我不认为你是对的。 while 循环中的常规函数​​可以正常工作。看看我刚刚在上面更新的“while 循环中的常规函数​​:”部分。
    • 它不起作用。你测试过吗?我已经测试过了,它没有。
    • 再次编辑代码。它现在工作。不过,感谢您的否决和粗鲁,您很摇滚;)
    猜你喜欢
    • 1970-01-01
    • 2015-03-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-11-10
    • 2012-10-01
    相关资源
    最近更新 更多