【发布时间】:2019-05-09 12:15:08
【问题描述】:
我是第一次使用 bs4。如果我使用这个基本代码:
from bs4 import BeautifulSoup
with open ('test.txt','r') as f:
soup = BeautifulSoup(f)
print f
终端的输出非常干净,不包含 html 标签。如果我尝试将其打印到 txt 文件,它会提示我添加解析器,因此我添加了“html.parser”。我没有得到相同的结果,即它充满了我试图摆脱的标签。如何在我的 txt 文件中获得相同的结果?
from bs4 import BeautifulSoup
with open ('test.txt','r') as f:
soup = BeautifulSoup(f,'html.parser')
with open ('test2.txt', 'w') as x:
x.write(str(soup))
*EDIT 下面是我运行此代码时 test2.txt 中的示例:
each\u00a0row you want to accept.\n <li>At the top of the list,
under the <b>Batch Actions</b> drop-down arrow,
choose\u00a0<b>Accept Selected</b>.</li>\n <li>All the selected
transactions\u00a0move from the <b>For Review
但在终端我得到:
each\u00a0row you want to accept.\n At the top of the list, under
the Batch Actions drop-down arrow, choose\u00a0Accept Selected.\n
All the selected transactions\u00a0move from the For Review
tab\u00a0to the In QuickBooks
【问题讨论】:
-
能否分享一些test.txt的内容
标签: python-2.7 beautifulsoup html-parsing