BeautifulSoup 的 find 方法返回 Tag 项，但调用 string 得到 'NoneType' 错误答案

【问题标题】：BeautifulSoup's find method returns Tag item, but call string get 'NoneType' ErrorBeautifulSoup 的 find 方法返回 Tag 项，但调用 string 得到 'NoneType' 错误
【发布时间】：2017-05-27 09:01:18
【问题描述】：

我刚开始学习python和bs4。我正在尝试解析一个看起来像这样的 html 页面：

....
<p class="result-info">
    <span class="result-meta">
        <span class="result-price">$1895</span>
        <span class="result-hood"> address1 </span>
    </span>
    ....

我的python代码如下：

soup = BeautifulSoup(allResponse.content)
    resultInfoTags = soup.find_all("p", class_="result-info")
    infoList = []
    for infoTag in resultInfoTags:
        infoDS = {}
        infoDS['detail_link'] = infoTag.find("a")['href']
        for metaData in infoTag.find_all("span", class_="result-meta"):
            firstSpan = metaData.find("span");
            infoDS['price'] = firstSpan.string
            lala = metaData.find("span", class_="result-hood")
            infoDS['area'] = lala.string
        infoList.append(infoDS)

错误发生在infoDS['area'] = lala.string 行。它抱怨

AttributeError: 'NoneType' object has no attribute 'string'

但是当我打印 type(lala) 时，它显示 lala 是<class 'bs4.element.Tag'>。当我打印 lala 本身时，它会显示整个标签文本

<span class="result-hood"> (address1)</span>

我很困惑，因为我使用相同的想法来获取 firstSpan 并且输出正确的字符串 $1895 没有任何问题。但这对拉拉不起作用……我花了几个小时在网上拼命调查和搜索，但没有发现任何有用的东西……

任何建议或提示将不胜感激！

【问题讨论】：

请贴出所有html代码

标签： beautifulsoup web-crawler html-parsing

【解决方案1】：

我不能 100% 确定这是否是您的代码的问题，但我认为您实际上从类中获取数据的方式是错误的。通常，当我使用美丽的汤来查找特定课程时，我会使用与您不同的方法。例如，我将使用以下方法：variable = soup.find(attrs={"class": class_name_here})followed by variable = variable.getText()。

因此，在您的情况下，请尝试以下操作：

soup = BeautifulSoup(allResponse.content)
resultInfoTags = soup.find_all(attrs={"class":"result-info"})
infoList = []
for infoTag in resultInfoTags:
    infoDS = {}
    infoDS['detail_link'] = infoTag.find('a', href=True)
    for metaData in infoTag.find_all(attrs={"class":"result-meta"}):
        firstSpan = metaData.find(attrs={"class":"result-price"})
        infoDS['price'] = firstSpan.getText()
        lala = metaData.find(attrs={"class":"result-hood"})
        infoDS['area'] = lala.getText()
    infoList.append(infoDS)

同样，我不确定这是否是您的程序中发生的事情，如果不是，请告诉我。

【讨论】：

它不起作用。我将metaData.find("span", class_="result-hood") 替换为您的格式find(attrs={"class":"result-hood"}) 它仍然抱怨AttributeError: 'NoneType' object has no attribute 'getText'...我们不同的实现应该以相同的方式工作。 class_格式是从bs4的官网学来的

【解决方案2】：

幸运的是，我今天通过大量实验发现了这一点。如果您遇到同样的问题，请在 infoDS['area'] = lala.getText() 行之前添加非检查 if lala is not None 以解决问题。虽然无检查在那里确实有意义，但我仍然不知道为什么如果有lala 的实际值，是否有无检查会影响代码。如果您碰巧知道原因，请在此处留下解决方案/解释。非常感谢！

【讨论】：