Curl 有效，但 urllib 无效

【问题标题】：Curl works but urllib doesn't [closed]Curl 有效，但 urllib 无效
【发布时间】：2014-04-28 06:15:38
【问题描述】：

每当我 curl this 时，我都能获取整个网页。但是，当我在 Python 中使用urllib 甚至机械化库时，我得到了403 error。有什么理由吗？

【问题讨论】：

不看你的代码就说不出口
我可以用urllib、urllib2 和urllib3 来GET 那个网址。

【解决方案1】：

试试这个，

import urllib2
from BeautifulSoup import BeautifulSoup
site= "http://www.economist.com/blogs/schumpeter/2014/04/alstom-block"
header = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup

输出：

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">
    <head>
....
...
..

【讨论】：

是的，这行得通。非常感谢！
你的意思是 UserAgent 是问题吗？

【解决方案2】：

你可以使用requests lib:

import requests
print requests.get('http://www.economist.com/blogs/schumpeter/2014/04/alstom-block').text

【讨论】：