【问题标题】:Python's requests library timing out but getting the response from the browserPython的请求库超时但从浏览器获得响应
【发布时间】:2018-04-02 10:17:55
【问题描述】:

我正在尝试为 NBA 数据创建一个网络 scraper。当我运行以下代码时:

import requests

response = requests.get('https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=10%2F20%2F2017&DateTo=10%2F20%2F2017&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=')

请求因错误而超时:

文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", 第 70 行,在获取 返回请求('get', url, params=params, **kwargs)

文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", 第 56 行,应要求提供 return session.request(method=method, url=url, **kwargs)

文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", 第 488 行,应要求提供 resp = self.send(prep, **send_kwargs)

文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", 第 609 行,在发送中 r = adapter.send(request, **kwargs)

文件 "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", 第 473 行,在发送中 引发 ConnectionError(err, request=request)

ConnectionError: ('连接中止。', OSError("(10060, 'WSAETIMEDOUT')",))

但是,当我在浏览器中点击相同的 URL 时,我会收到响应。

【问题讨论】:

    标签: python web-scraping python-requests user-agent


    【解决方案1】:

    看起来您提到的网站正在检查请求标头中的"User-Agent"。您可以在请求中伪造"User-Agent",使其看起来像是来自实际浏览器,并且您会收到响应。

    例如:

    import requests
    url = "https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=10%2F20%2F2017&DateTo=10%2F20%2F2017&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
    
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
    # it's the user-agent of my browser ^ 
    
    response = requests.get(url, headers=headers)
    response.status_code    # will return: 200
    
    response.text      # will return the website content
    

    您可以从here找到您浏览器的用户代理。

    【讨论】:

    • 轰隆隆!你可以找到自己的User-Agenthere
    【解决方案2】:

    如果仍然无法正常工作,请使用此标头:

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'en-US,en;q=0.9,hi;q=0.8'}
    

    【讨论】:

      【解决方案3】:

      如果其他标题不起作用,试试这个 HEADER ,它对我来说效果很好。

      headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15","Accept-Language": "en-gb","Accept-Encoding":"br, gzip, deflate","Accept":"test/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Referer":"http://www.google.com/"}
      

      this link收集这些标头

      【讨论】:

        猜你喜欢
        • 2017-02-22
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-07-14
        • 2014-03-24
        • 2022-01-19
        • 1970-01-01
        • 2020-02-24
        相关资源
        最近更新 更多