【问题标题】:python urllib2 can not fetch a specific url [duplicate]python urllib2无法获取特定的url [重复]
【发布时间】:2014-07-20 08:50:12
【问题描述】:

我正在使用 urllib2 请求 URL 并读取其内容,但不幸的是它不适用于某些 URL。看看这些命令:

#No problem with this URL
urllib2.urlopen('http://www.huffingtonpost.com/2014/07/19/todd-akin-slavery_n_5602083.html')
#This one produced error
urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')

生成的第二个 URL 和错误如下:

Traceback (most recent call last):
  File "D:/Developer Center/Republishan/republishan2/republishan2/test.py", line 306, in <module>
    urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

这有什么问题?

【问题讨论】:

标签: python httprequest urllib2


【解决方案1】:

我认为该网站正在检查 User-Agent 和/或其他 urllib 默认未设置的标头。

您可以手动设置用户代理。

Requests 库自动设置用户代理。

但请记住,请求用户代理也可能被某些网站阻止。

试试这个。这对我有用。你需要先安装 requests 模块!

pip install requests

然后

import requests

r = requests.get("http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/")

print r.text

Urllib 很难,您必须编写更多代码。 Requests 更简单,更符合 Python 的代码应该是美丽的哲学!

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-12-29
    • 2014-11-11
    • 2012-07-12
    • 2022-11-30
    • 2013-08-06
    • 1970-01-01
    • 2012-12-18
    相关资源
    最近更新 更多