BeautifulSoup：解析 HTML 文件时出现 NoneType 错误答案

【问题标题】：BeautifulSoup: NoneType error when parsing a HTML fileBeautifulSoup：解析 HTML 文件时出现 NoneType 错误
【发布时间】：2017-08-14 02:33:37
【问题描述】：

我的脚本写在下面，发现soup.get_text() 命令出错。代码：

from BeautifulSoup import *
soup=BeautifulSoup(open("F:\\HTML\\Registrationform.html"))
print soup.get_text('+')

错误：文件“C:/Python27/beautifulsoup4-4.6.0.tar/scrapingbasic.py”，第 3 行，在

 print soup.get_text('+')
TypeError: 'NoneType' object is not callable

【问题讨论】：

更新到beautifulsoup4

标签： python web-scraping beautifulsoup

【解决方案1】：

BeautifulSoup 类在构造函数中需要 html/xml 内容。所以将.read() 添加到您的open 函数应该可以工作。代码如下：

from BeautifulSoup import *
soup=BeautifulSoup(open("F:\\HTML\\Registrationform.html").read())

print soup.get_text('+')

另外，我建议您升级到BeautifulSoup4。

希望这会有所帮助。

【讨论】：

【解决方案2】：

Beautifulsoup 需要 html/xml 文档。你能检查一下python 2.x是否可以解析你的html文件，只是为了重新检查。 windows 上可能会出现另一个问题，需要确保 lxml 库安装成功。您也可以从以下位置重新检查文档：https://www.crummy.com/software/BeautifulSoup/bs4/doc/

以下部分：

from bs4 import BeautifulSoup
with open("index.html") as fp:
    soup = BeautifulSoup(fp)

【讨论】：

运行成功打印soup.get_text('+')。谢谢大家。