【发布时间】:2012-10-14 20:19:29
【问题描述】:
我制作了一个供个人使用的 python 脚本,但它不适用于维基百科...
这项工作:
import urllib2, sys
from bs4 import BeautifulSoup
site = "http://youtube.com"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)
print soup
这不起作用:
import urllib2, sys
from bs4 import BeautifulSoup
site= "http://en.wikipedia.org/wiki/StackOverflow"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)
print soup
这是错误:
Traceback (most recent call last):
File "C:\Python27\wiki.py", line 5, in <module>
page = urllib2.urlopen(site)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 406, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 444, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
【问题讨论】:
-
不要尝试抓取维基百科页面。他们提供了一个非常好的 API,你应该使用它。
-
能给个链接吗?我刚读到他们允许你刮
-
@Loclip API 页面不言自明:en.wikipedia.org/w/api.php
标签: python python-2.7 beautifulsoup