【发布时间】:2015-01-07 10:24:23
【问题描述】:
代码:
import requests
import urllib
from bs4 import BeautifulSoup
page1 = urllib.request.urlopen("http://en.wikipedia.org/wiki/List_of_human_stampedes")
soup = BeautifulSoup(page1)
print(soup.get_text())
print(soup.prettify())
错误:
Traceback (most recent call last):
File "C:\Users\sony\Desktop\Trash\Crawler Try\try2.py", line 9, in <module>
print(soup.get_text())
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 10487: character maps to <undefined>
我认为问题主要在于 urllib 包。这里我使用的是 urllib3 包。他们将 urlopen 语法从 2 版本更改为 3 版本,这可能是错误的原因。但话虽如此,我只包含了最新的语法。 Python 3.4 版
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup web-crawler