html标签的正则表达式答案

【问题标题】：Regular expression for html tagshtml标签的正则表达式
【发布时间】：2014-08-23 04:15:01
【问题描述】：

嘿，这是我正在处理的确切代码，我需要捕获此内容：

播出：在 BBC 直播，第 1 卷。 2
披头士乐队
2013
流行/摇滚

我一直在尝试为此编写一个正则表达式，但我无法完全正确。我认为 div-tag 和 ahref-tag 不在同一行中存在一些问题。可能是，我不确定。请帮助...我需要一个正则表达式。谢谢。

<div class="title">
            <a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{&quot;id&quot;:&quot;MW0002581064&quot;,&quot;thumbnail&quot;:true}">On Air: Live at the BBC, Vol. 2</a>            </div>

                <div class="artist">
                <a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a>            </div>

                <div class="year">
            2013            </div>

                <div class="genres">
            Pop/Rock            </div>

【问题讨论】：

你should not parse HTML with regex。还有更可靠的方法。你的语言是什么？
@LucasTrzesniewski Python
你不能在 Python 中使用 HTML Parser 吗？ docs.python.org/3/library/html.parser.html
我不懂python，但我知道我会选择其他语言的类似jQuery的库，python等价物是this
是的，@Jontatas 回答了 Python 中的 HTML 解析器。谢谢你们两个:)

标签： regex

【解决方案1】：

你或许可以使用BeautifulSoup:

from bs4 import BeautifulSoup
html = '''
    <div class="title">
        <a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{&quot;id&quot;:&quot;MW0002581064&quot;,&quot;thumbnail&quot;:true}">On Air: Live at the BBC, Vol. 2</a>
    </div>
    <div class="artist">
        <a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a>
    </div>
    <div class="year">
        2013
    </div>
    <div class="genres">
        Pop/Rock
    </div>
    '''

soup = BeautifulSoup(html)

for s in soup.find_all("div", ["title","artist","year","genres"]):
    print(s.text.strip())

输出：

On Air: Live at the BBC, Vol. 2
The beatles
2013
Pop/Rock

【讨论】：

我什至不知道您可以将列表作为find_all awesome 的第二个参数传递