【发布时间】:2016-12-21 07:19:32
【问题描述】:
我的代码...
foo = fromstring(my_html)
它引发了这个警告......
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
markup_type=markup_type))
我尝试将字符串 'html.parser' 传递给它,但这不起作用,因为它给了我一个错误,说字符串不是可调用的,所以我尝试了 html.parser,然后我查看了 lxml 模块,看看我是否可以找到另一个解析器,但不能。我查看了 python 标准库,发现在 2.7 中有一个名为HTMLParser,所以我导入了它并输入了beautifulsoup=HTMLParser,但这也不起作用。
我应该传递给fromstring 的可调用对象在哪里?
EDIT 添加了尝试的解决方案:
from lxml.html.soupparser import fromstring
wiktionary_page = fromstring(wiktionary_page.read(), features="html.parser" )
还有这个
from lxml.html.soupparser import BeautifulSoup
wiktionary_page = fromstring(wiktionary_page.read(), beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))
【问题讨论】:
标签: python beautifulsoup lxml