【发布时间】:2013-02-15 03:25:44
【问题描述】:
运行此代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("my.html"))
print(soup.prettify())
产生这个错误:
Traceback (most recent call last):
File "soup.py", line 5, in <module>
print(soup.prettify())
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25ba' in position
9001: character maps to <undefined>
然后我尝试了:
print(soup.encode('UTF-8').prettify())
但是由于使用字节对象的字符串格式化而失败:
Traceback (most recent call last):
File "soup.py", line 11, in <module>
print(soup.encode('UTF-8').prettify())
AttributeError: 'bytes' object has no attribute 'prettify'
不知道如何解决这个问题。任何意见将不胜感激。
【问题讨论】:
-
尝试先从字节解码字符串:bytes.decode(my.html)
-
我无法用漂亮的汤来完成这项工作(AttributeError: 'str' object has no attribute...)
标签: python parsing encoding python-3.x beautifulsoup