BeautifulSoup - TypeError：“NoneType”对象不可调用答案

【问题标题】：BeautifulSoup - TypeError: 'NoneType' object is not callableBeautifulSoup - TypeError：“NoneType”对象不可调用
【发布时间】：2016-10-10 12:13:21
【问题描述】：

我需要使我的代码向后兼容 python2.6 和 BeautifulSoup 3。我的代码是使用 python2.7 编写的，在本例中使用 BS4。但是当我尝试在 squeezy 服务器上运行它时，我得到了这个错误（它有 python2.6 和 bs3）：

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

gmp = open(fname, 'r')
soup = BeautifulSoup(gmp)
p = soup.body.div.find_all('p')

p = soup.body.div.find_all('p')
TypeError: 'NoneType' object is not callable

如果我改为：

   p = soup.body.div.findAll('p')

然后我得到这个错误：

p = soup.body.div.findAll('p')
TypeError: 'NoneType' object is not callable

更新抛出的错误

  File "/home/user/openerp/7.0/addons/my_module/models/gec.py", line 401, in parse_html_data
    p = soup.body.div.findAll('p') #used findAll instead of find_all for backwards compatability to bs3 version
TypeError: 'NoneType' object is not callable

无论哪种方式，这两种方法都适用于我的带有 python2.7 和 bs4 的 Ubuntu，但不适用于 squeezy。那些我看不到/不知道并给我这个错误的版本之间是否还有其他区别？

【问题讨论】：

仅使用版本 4 语法时，回退到 from BeautifulSoup import BeautifulSoup（版本 3）是没有意义的。
你应该看到我写的我尝试使用向后兼容的语法，但仍然得到同样的错误。

标签： python beautifulsoup backwards-compatibility

【解决方案1】：

您使用的是 BeautifulSoup 3，但使用的是 BeautifulSoup 4 语法。

你的后备有问题：

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

如果您想使用第 3 版或第 4 版，请坚持使用第 3 版语法：

p = soup.body.div.findAll('p')

因为find_all 在 BeautifulSoup 3 中不是一个有效的方法，所以它被解释为一个标签搜索。您的 HTML 中没有 find_all 标记，因此返回了 None，然后您尝试调用它。

接下来，BeautifulSoup 3 使用的 解析器 将对损坏或不完整的 HTML 做出不同的响应。如果您在 Ubuntu 上安装了 lxml，那么它将用作默认解析器，它会为您插入缺少的 <body> 标签。 BeautifulSoup 3 可能会忽略这一点。

我强烈建议您改为删除回退，并坚持使用 BeautifulSoup 版本 4仅。版本 3 已于多年前停产，并且包含未修复的错误。 BeautifulSoup 4 还提供了您可能想要使用的其他功能。

BeautifulSoup 是纯 Python，可轻松安装到 Python 支持的任何平台上的虚拟环境中。您在此处不与系统提供的软件包绑定。

例如，在 Debian Squeezy 上，您会被 BeautifulSoup 3.1.0 甚至BeautifulSoup developers do not want you to use it! 卡住。您对findAll 的问题几乎可以肯定源于使用该版本。

【讨论】：

我的意图是使用 bs4。但问题是，squeezy 只有 bs3，我也需要它在那里工作。但是，为什么 findAll 在退回到 bs3 时不起作用？好吧，当我使用 bs3 时。
@Andrius：我正要在你的问题上发布这个：findAll() 引发的异常的完整追溯是什么？您确定您在那里复制了正确的异常消息（与find_all 相同）吗？
我只是复制/粘贴了使用 findAll 时遇到的确切错误
@Andrius：很有趣，因为我无法用 BeautifulSoup 3.2.1 重现它。我确实有一个 Squeezy 系统，因为 PyPI 上不再提供 3.1.0，我会尝试在那里安装它。
@Andrius：这看起来像是 3.1.0 特有的问题。那个版本很快被 3.2 系列取代，2009 年开发人员告诉大家不要使用 3.1。不要使用那个版本，你不会有问题。

【解决方案2】：

我知道这是一篇已有 6 年历史的帖子，但如果有人遇到类似问题，请发布此帖子。

在第 9 行看来它应该是一个格式化的字符串，在添加 f 后它似乎工作得非常好。

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

product_all_pages = []

for i in range(1,15):
    response = requests.get(f"https://www.bol.com/nl/s/?page={i}&searchtext=hand+sanitizer&view=list")
    content = response.content
    parser = BeautifulSoup(content, 'html.parser')
    body = parser.body
    producten = body.find_all(class_="product-item--row js_item_root")
    product_all_pages.extend(producten)
len(product_all_pages)

price = float(product_all_pages[1].meta.get('content'))
productname = product_all_pages[1].find(class_="product-title--inline").a.getText()
print(price)
print(productname)

productlijst = []

for item in product_all_pages:
    if item.find(class_="product-prices").getText() == '\nNiet leverbaar\n':
        price = None
    else:
        price = float(item.meta['content'])
    product = item.find(class_="product-title--inline").a.getText()
    productlijst.append([product, price])
    
print(productlijst[:3])

df = pd.DataFrame(productlijst, columns=["Product", "price"])
print(df.shape)
df["price"].describe()

【讨论】：