【发布时间】:2015-08-14 10:26:08
【问题描述】:
这是我目前所拥有的:
from bs4 import BeautifulSoup
def cleanme(html):
soup = BeautifulSoup(html) # create a new bs4 object from the html data loaded
for script in soup(["script"]):
script.extract()
text = soup.get_text()
return text
testhtml = "<!DOCTYPE HTML>\n<head>\n<title>THIS IS AN EXAMPLE </title><style>.call {font-family:Arial;}</style><script>getit</script><body>I need this text captured<h1>And this</h1></body>"
cleaned = cleanme(testhtml)
print (cleaned)
正在删除脚本
【问题讨论】:
-
你的预期输出是什么?
标签: python html beautifulsoup