如何避免 replace_with 转义我的“<”和“>”？答案

【问题标题】：how to avoid replace_with escape my '<' and '>'?如何避免 replace_with 转义我的“<”和“>”？
【发布时间】：2015-02-17 14:11:56
【问题描述】：

现在我正在使用 beautifulsoup 来处理 html。当我使用 replace_with() 时，它返回这个结果。它转义了我的 ''。

>>> tt = bs('<p><a></a></p>')

>>> bb = tt.p

>>> tt

<html><body><p><a></a></p></body></html>

>>> bb

<p><a></a></p>

>>> bb.replace_with('<p>aaaaaaa<aaaaa></p>')

<p><a></a></p>

>>> tt

<html><body>&lt;p&gt;aaaaaaa&lt;aaaaa&gt;&lt;/p&gt;</body></html>

我想要这样的 tt 输出：

>>> tt

<html><body><p>aaaaaaa<aaaaa></p></body></html>

我该怎么办？第三季
------更新--------------
在这里，我正在用python编写一个程序，用于将你的html博客转换为markdown。它的代码是here。我的主要做法是：
1 使用urllib2爬取页面代码
2 使用beautifulSoup解析dom树
3 使用beautifulSoup修改exisit dom树（这里我使用bs.replace_with）
4 将修改后的dom树保存到markdown文件中

问题是beautifulSoup在我修改dom树时会自动转义''。这意味着dom树的修改没有像我预期的那样。 html是

 service tool->SQL Server Reporting Services

降价是

 service tool-&gt;SQL Server Reporting Services

【问题讨论】：

你看过this post吗？

标签： python beautifulsoup

【解决方案1】：

from bs4 import BeautifulSoup
tt = BeautifulSoup('<p><a></a></p>')

new = BeautifulSoup('<p>aaaaaaa<aaaaa></p>')
tt.p.replace_with(new.p)

使用您自己的代码，您可以使用output formatter 来查看您想要的输出：

from bs4 import BeautifulSoup
tt = BeautifulSoup('<p><a></a></p>')
tt.p.replace_with('<p>aaaaaaa<aaaaa></p>')
print(tt.prettify(formatter=None))
<html>
 <body>
  <p>aaaaaaa<aaaaa></p>
 </body>
</html>

您也可以替换标签内的字符串，但我不完全确定您想要实现什么，但documentation 非常清晰易懂。

【讨论】：

你能补充一些解释性文字吗？
感谢您的回复。它启发了我。我将 encode() 与输出格式化程序一起使用，就像您示例中的美化一样。3Q