'str'对象没有属性'find_all'美丽的汤[重复]答案

【问题标题】：'str' object has no attribute 'find_all' beautiful soup [duplicate]'str'对象没有属性'find_all'美丽的汤[重复]
【发布时间】：2019-04-06 04:31:47
【问题描述】：

这是我的代码，非常简单。由于某种原因，出现上述错误。即使我删除了text = str(html) 并将soup = BeautifulSoup(text, 'html.parser') 替换为soup = BeautifulSoup(html, 'html.parser')，也会出现同样的错误。怎么回事？

with urllib.request.urlopen('https://jalopnik.com/search?q=mazda&u=&zo=-07:00') as response:
   html = response.read()  
text = str(html)  
soup = BeautifulSoup(text, 'html.parser')
print(type(soup))
soup = soup.prettify()
print(soup.find_all('div'))

【问题讨论】：

soup = soup.prettify() 使soup 成为一个字符串。不要用美化的字符串代替你的汤。

标签： python beautifulsoup

【解决方案1】：

soup = soup.prettify() 返回一个字符串，并且因为您将其分配回soup，所以当您调用soup.find_all() 时，soup 将成为一个字符串。

来自pretty printing section of the BeautifulSoup documentation：

prettify() 方法会将 Beautiful Soup 解析树转换为格式良好的 Unicode 字符串。

不要用美化的字符串代替你的汤。 BeautifulSoup 不需要美化，只有当你想把汤转回字符串保存到文件或调试时才需要。

soup = BeautifulSoup(text, 'html.parser')
print(soup.find_all('div'))

工作得很好。

您也确实不想使用str(html) 来解码bytes 对象。通常你会使用html.decode('utf8') 或类似的； str(html) 为您提供一个以 b' 开头并以 ' 结尾的值

但是，BeautifulSoup 完全能够自行解码字节值。它也可以直接从响应中读取：

with urllib.request.urlopen('https://jalopnik.com/search?q=mazda&u=&zo=-07:00') as response:
    soup = BeautifulSoup(response, 'html.parser')
print(soup.find_all('div'))

【讨论】：