【发布时间】:2026-02-18 05:20:08
【问题描述】:
我有这个字符串:
In December 2011, Norway's largest online sex shop hemmelig.com was <a href="http://www.dazzlepod.com/hemmelig/?page=93" target="_blank" rel="noopener">hacked by a collective calling themselves "Team Appunity"</a>. The attack exposed over 28,000 usernames and email addresses along with nicknames, gender, year of birth and unsalted MD5 password hashes.
(别问)
该字符串内部是指向站点本身的 HREF 链接,我需要做的是提取标签 <a href=""></a> 之间的信息。所以最终结果应该是这样的:
In December 2011, Norway's largest online sex shop hemmelig.com was hacked by a collective calling themselves "Team Appunity". The attack exposed over 28,000 usernames and email addresses along with nicknames, gender, year of birth and unsalted MD5 password hashes.
到目前为止,我能够做的是使用正则表达式匹配整个标签,然后用任何内容替换它:
def get_unlinked_description(descrip):
html_tag_regex = re.compile(r"<.+>", re.I)
return html_tag_regex.sub("", descrip)
然而,正如你所料,这个输出会删除整个字符串:
In December 2011, Norway's largest online sex shop hemmelig.com was . The attack exposed over 28,000 usernames and email addresses along with nicknames, gender, year of birth and unsalted MD5 password hashes
如何在不删除完整字符串的情况下成功提取标签之间的信息以及删除标签? .
【问题讨论】:
标签: python python-2.7