【发布时间】:2018-05-07 00:14:24
【问题描述】:
我想在Python中提取html标签内的(段落)
<p style="text-align: justify;"><span style="font-size: small; font-family: lato, arial, helvetica, sans-serif;">
Irrespective of the kind of small business you own, using traditional sales and marketing tactics can prove to be expensive.
</span></p>
我的代码是
from HTMLParser import HTMLParser
from bs4 import BeautifulSoup
x = """<p style="text-align: justify;"><span style=& quot;font-size: small; font-family: lato, arial, helvetica, sans-serif;"> Irrespective of the kind of small business you own, using traditional sales and marketing tactics can prove to be expensive. </span></p>"""
p1 = HTMLParser()
p1.unescape(x)
bdy_soup = BeautifulSoup(p1.unescape(x)).get_text(separator=";")
print(bdy_soup)
此代码没有返回任何内容,请帮助我这样做,任何帮助将不胜感激
【问题讨论】:
-
你是从html页面还是文本文件中读取?
-
@prakash-palnati --- 从 Sql 表中读取
-
@s.s 你可以使用
BeautifulSoup来提取你的精确数据。先做import html >>> html.unescape(x). -
@manoj jadhav 你能解释一下代码吗
-
@s.s 查看我的帖子。
标签: python html python-3.x