无法区分应该以相同方式工作的两个表达式答案

【问题标题】：Can't differentiate the two expressions supposed to work in the same way无法区分应该以相同方式工作的两个表达式
【发布时间】：2019-10-24 03:24:00
【问题描述】：

几天前，我创建了 this post，以寻求任何解决方案，让我的脚本以这样的方式循环，以便脚本使用很少链接来检查我定义的title（应该从每个链接中提取）在four 次中是否没有任何意义。如果title 仍然没有，则脚本将break loop 并转到另一个链接以重复相同的操作。

这就是我获得成功的方式--► 通过将fetch_data(link) 更改为return fetch_data(link) 并在while loop 之外但在if 语句中定义counter=0。

修正脚本：

import time
import requests
from bs4 import BeautifulSoup

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
counter = 0

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""

    if not title:
        while counter<=3:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            return fetch_data(link) #First fix
        counter=0 #Second fix

    print("tried with this link:",link)

if __name__ == '__main__':
    for link in links:
        fetch_data(link)

这是上述脚本产生的输出（根据需要）：

trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

I used wrong selector within my script so that I can let it meet the condition I've defined above.

为什么我应该使用return fetch_data(link) 而不是fetch_data(link)，因为大多数时候表达式的工作方式相同？

【问题讨论】：

旁注：这里你的明确返回只适用于失败的案例。

标签： python python-3.x web-scraping conditional-statements

【解决方案1】：

如果您的函数内的 while 循环无法获取标题，它将启动递归调用。它在您使用return fetch_data(link) 时起作用，因为每当计数器小于或等于 3 while counter<=3 时，它将在 while 循环结束时立即退出函数，因此不会下降到将重置计数器的下一行到 0 counter=0。由于计数器是一个全局变量，并且每个递归深度仅增加 1，因此您最多只能有 4 个递归深度，因为只要 counter 大于 3，它就不会进入将调用另一个的 while 循环fetch_data(link).

fetch_data (counter=0)
  --> fetch_data (counter=1)
    --> fetch_data (counter=2)
      --> fetch_data (counter=3)
        --> fetch_data (counter=4) 
        - not go into while loop, reset counter, print url
        - return to above function
      - return to above function
    - return to above function
  - return to above function

如果使用fetch_data(link)，该函数仍会在while循环中发起递归调用。但是，不要立即退出，会将计数器重置为 0。这很危险，因为在您的计数器变为 4 后，该函数并返回到 while 循环内上一个函数调用的 while 循环，while 循环不会中断并且继续发起额外的递归调用，因为计数器当前设置为 0，即

fetch_data (counter=0)
  --> fetch_data (counter=1)
    --> fetch_data (counter=2)
      --> fetch_data (counter=3)
        --> fetch_data (counter=4) 
        - not go into while loop, !!!reset counter!!!, print url
        - return to above function
      - not return to above function call
      - since counter = 0, continue the while loop
        --> fetch_data (counter=1)
          --> fetch_data (counter=2)
            --> fetch_data (counter=3)
...

【讨论】：

现在这很有意义@VietHTran。感谢您的清晰说明。