用文本替换 HTML 链接答案

【问题标题】：Replace HTML links with text用文本替换 HTML 链接
【发布时间】：2014-08-01 04:17:08
【问题描述】：

如何在 html (python) 中用锚点替换链接？

例如输入：

 <p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>

我想要保存 p 标签的结果（只是一个标签删除）：

<p>
Hello link text1 and link text2 ! 
</p>

【问题讨论】：

我不知道答案，但我猜它涉及 BeautifulSoup :-)
@mgilson，一个简单的正则表达式不会解决非嵌套锚的情况吗？
stackoverflow.com/questions/2584885/strip-tags-python

标签： python html parsing text-parsing

【解决方案1】：

看起来是 BeautifulSoup 的 unwrap() 方法的完美案例：

from bs4 import BeautifulSoup
data = '''<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'''
soup = BeautifulSoup(data)
p_tag = soup.find('p')
for _ in p_tag.find_all('a'):
    p_tag.a.unwrap()
print p_tag

这给出了：

<p> Hello link text1 and link text2 ! </p>

【讨论】：

【解决方案2】：

您可以使用简单的正则表达式和sub 函数来做到这一点：

import re

text = '<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'
pattern =r'<(a|/a).*?>'

result = re.sub(pattern , "", text)

print result
'<p> Hello link text1 and link text2 ! </p>'

此代码将所有出现的<a..> 和</a> 标记替换为空字符串。

【讨论】：

【解决方案3】：

您可以使用 Parser Library 来处理它。例如 BeautifulSoup 和其他。我不确定，但你可以得到一些东西here

【讨论】：