【问题标题】:Method Not Allowed In Python 3 using urllib2Python 3 中不允许使用 urllib2 的方法
【发布时间】:2018-11-05 13:45:17
【问题描述】:
    from bs4 import BeautifulSoup
    import urllib.request as urllib2

    url="http://www.scmp.com/news/world"
    page = urllib2.urlopen(url)

    soup = BeautifulSoup(page, "html.parser")

    item = soup.find_all("h3", _class="node-title lvl_24-title")

    print(item)

此代码给出一个 Method not allowed only on this url 的错误,相同的代码适用于我正在尝试的大多数 url。 下面是完整的错误信息

    Traceback (most recent call last):
    File "E:/Scrappers/test11.py", line 6, in <module>
    page = urllib2.urlopen(url)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    223, in urlopen
    return opener.open(url, data, timeout)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    532, in open
    response = meth(req, response)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    642, in http_response
    'http', request, response, code, msg, hdrs)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    570, in error
    return self._call_chain(*args)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    504, in _call_chain
    result = func(*args)
    File "C:\Program Files (x86)\Python36-32\lib\urllib\request.py", line 
    650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 405: Method Not Allowed

【问题讨论】:

    标签: python-3.x beautifulsoup urllib2


    【解决方案1】:

    此问题可能与HTTPError 重复。

    由于您的 urlopen 请求中未指定 User-Agent,您已被检测为机器人。我可以推荐一个不那么痛苦的“请求”库吗?

    import requests
    from bs4 import BeautifulSoup
    #Specify some headers. urlopen uses "Python-urllib" as a header, which makes you seem like a bot.
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
    url = 'http://www.scmp.com/news/world'
    page = requests.get(url,headers=headers)
    soup = BeautifulSoup(page.content,'lxml')
    

    瞧!你给自己弄了些汤来玩。

    【讨论】:

      猜你喜欢
      • 2016-08-24
      • 2016-04-21
      • 2018-06-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-01
      相关资源
      最近更新 更多