Python：beautifulsoup 的输出编码错误答案

【问题标题】：Python: Output of beautifulsoup has wrong encodingPython：beautifulsoup 的输出编码错误
【发布时间】：2013-05-26 10:30:33
【问题描述】：

我遇到了一个编码问题，当一个响应放在 beautifulsoup 中时。响应的可读输出以Artikelstandort: Österreich 之类的正确方式格式化，但在运行beautifulsoup 后它将转换为Artikelstandort: Ã–sterreich。我会为您提供更改的代码：

def formTest (browser, formUrl, cardName, edition):
   browser.open (formUrl)

   data = browser.response().read()
   with open ('analyze.txt', 'wb') as textFile:
      print 'wrinting file'
      textFile.write (data)

   #BS4 -> need from_encoding
   soup = BeautifulSoup (data, from_encoding = 'latin-1')
   soup = soup.encode ('latin-1').decode('utf-8')
   table = soup.find('table', { "class" : "MKMTable specimenTable"})

data 有正确的数据，但 soup 有错误的编码。我在 soup 上尝试了各种编码/解码，但没有得到任何工作结果。

我从中提取数据的页面是：https://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod

编辑： 我像建议的那样使用 prettify 更改了编码，但现在我面临以下错误：

TypeError: slice indices must be integers or None or have an __index__ method

美化有什么变化？我绘制了新的输出，表格仍在“soup”中（<table class="MKMTable specimenTable">）

编辑2：

新的错误是：

在：soup.encode ('latin-1').decode('utf-8')

错误：UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte

如果我使用编码和解码，解码其他字节时会出现错误。

【问题讨论】：

试试 test = soup.prettify('latin-1')
另外，这个答案可能会有所帮助。 stackoverflow.com/questions/7219361/…
更新了主帖。和你建议的一样，我已经试过了，结果出现了上面描述的错误。
您是从 Prettify 行还是 soup.find 行收到错误消息？ soup 原本是一个 beautifulSoup 对象，但 prettify 返回一个 Unicode 字符串。
在链接的答案“终于明白了，只需要：soup = BeautifulSoup(content, fromEncoding='latin-1') 然后有时间解析链接：i_title = item.contents [0].encode('latin-1').decode('utf-8') 似乎可以解决问题。感谢您的帮助:)"

标签： python encoding beautifulsoup mechanize decoding

【解决方案1】：

您现在可能不需要该解决方案，但如果有人在这里停下来，您应该这样做：
您可能应该在 data 而不是 soup 上使用编码过程。
我通常做的是使用requests 库获取原始响应，然后使用'response.text' 之类的语法获取文本内容，然后使用response.encoding='utf-8' 强制编码。
至少，我将 response.text 提供给BeautifulSoup()

【讨论】：